Talk to your Data Part 2: Limits and Performance Enhancements

19.2.2026 | 8 minutes reading time

In part one of this series, we introduced the MotherDuck MCP server in combination with opencode and showcased initial context engineering. We also showed deeper knowledge retrieval using natural language instead of SQL. In this article we will dive deeper into the retrieval capabilities and test limits of prompting and precise formulations to analyze data from a chat interface.

For context: we will be using the same weather dataset for Munich for the last 10 years (2015-2025) and measurements from sensors to monitor cycling activity in Munich for the same period.

Finding Edge Cases and Forcing Hallucinations

We have probably all dealt with ChatGPT, Claude or Gemini hallucinating or presenting incorrect facts with complete confidence. This makes it hard to separate fact from fiction when the underlying data or sources are not available to us. We wanted to test the robustness of our setup and find information in a dataset we do not know well. This would show us how quickly we can get familiar with new data and whether we can verify the claims made about our data afterwards.

We started where we left off in the first part, having created a baseline context and analyzed heat development during the summer months. Next, we decided to explore the connection between weather and cycling activity. To avoid overloading the context, we limited our prompt to the year 2025.

The prompt: Use the MotherDuck query tool to find a connection between weather and cycling activity for the year 2025. Identify the columns for dates, bike counts, and locations in the sensor data along with the temperature, rain, and wind columns in the weather data. Look for relationships between daily bike traffic and weather conditions throughout the year. Identify specific tipping points like a certain temperature or amount of rain where cycling volume changes significantly. Present the findings in a table with a plain text summary explaining how weather patterns drove cycling activity in 2025.

The agent executed seven queries and took about one minute to complete. It correctly identified the daily weather aggregations table and used it instead of the hourly weather data to prevent unnecessary queries. The queries still contained redundancy since CTEs were recreated often during individual queries because results from previous queries were not available.

The first major issue appeared in the results. The database contained datasets for multiple locations. The agent chose the wrong table as its source of weather data, linking weather activity from other parts of Germany to the cycling activity in Munich. After explicitly stating that the Munich weather dataset should be used and that the cycling dataset measures activity in Munich, we received different results.

The response contained different tables showing correlations between temperature, observed rainfall, average number of cyclists for these conditions, min and max values, and indications for the variation of the observed number of cyclists under those conditions.

We noticed the agent did not correctly understand that there are different stations measuring activity for the same day. It reported that there were 137 days below zero degrees in Munich, for example. This happened even after we created a context window and asked the agent to inspect the cycling activity tables beforehand.

This clearly shows that sufficient annotation is necessary to prevent hallucinations, assumptions, and guesswork during ad hoc analysis using natural language. Even with prior context priming, the agent ran many queries and generated context and content that was reported to us with high confidence and nice formatting. However, it fell apart when we asked follow up questions like "are there really so many days below zero degrees celsius in Munich in 2025?".

Even after several corrections, the report remained inaccurate. This led us to conclude that asking very specific questions on aggregated data or general exploration tasks at table level are better use cases than sophisticated analysis requiring extensive meta knowledge about the data or detailed annotations for the agent to inspect.

Testing with Annotated Data

DuckDB (and therefore MotherDuck) offers the ability to add comments to tables, columns and views. You can add comments by running SQL in the MotherDuck web app or via a custom script that uses the DuckDB client and establishes a connection to MotherDuck with a token. We added comments to each table and column to potentially increase retrieval performance. The comments described the general purpose of the tables and the contents of each column.

To make the analysis comparable despite the ambiguity of natural language and varying behavior of generative AI models, we recreated the dataset with annotations and started a new session with the agent. We rebuilt the same context window with the same prompts until the point where retrieval performance had significantly worsened earlier. Our goal was to drastically improve performance and prevent the assumptions that appeared before. The problematic query improved somewhat without additional intervention, but the agent focused on one of the measuring stations while five exist in the dataset. This decision changed the result drastically and omitted information about the other measurement stations, resulting in an incomplete analysis.

We followed up with this prompt to clarify our intent: Can you check the comments on the 2025 bike activity table?. Only after explicitly mentioning the comments and forcing the agent to inspect them did the results become accurate and insightful. We learned that cycling activity increased when the temperature rose above 15 degrees and that activity increased identically across all stations. Furthermore, rain has a significant impact on cycling activity but is tolerated up to a certain point. We validated these findings against the actual data.

Exhaustive Prompting vs. Iterative Conversation

When we think of "talking to our data," we often imagine asking simple questions and receiving direct, reliable answers. Our initial query, however, was relatively verbose: instructing the agent to identify columns, analyze relationships, find tipping points, and format results in a specific way. This led us to explore whether a more minimal approach could achieve similar results.

We started a fresh session with the annotated database, provided the same baseline context as before, and reduced our query to a single sentence: "Look for relationships between daily bike traffic and weather conditions throughout the year 2025 and identify tipping points." This approach was largely exploratory and we did not expect particularly strong results. Surprisingly, the agent produced highly accurate insights that matched the quality of our verbose prompt.

Encouraged by this outcome, we experimented with a third approach. The MotherDuck documentation recommends breaking complex queries into smaller, conversational steps rather than asking everything at once. We tested this iterative method against both our original detailed query and the concise single-question prompt. While this recommendation has merit for general use, it did not significantly improve results in our case. The core correlations between weather conditions and cycling activity remained consistent across all three approaches, the accuracy of identifying weather-cycling tipping points varied by only approximately 2% between annotated and non-annotated datasets. Please refer to our detailed summary of our findings for more information.

Although the factual outcomes were largely stable, we observed notable differences in what the agent chose to emphasize. The iterative conversation approach led the agent to focus on rain duration as a key variable. The detailed comprehensive query produced more station-level breakdowns. The simplified prompt encouraged broader behavioral pattern analysis across all stations.

These variations are both intriguing and impressive, but they highlight an inherent challenge: without carefully structured prompting, the analyst has limited control over the agent's interpretive focus. In our case, the simplified query happened to align well with our analytical intent. However, repeating the same prompt under identical conditions may yield different emphases, some potentially more insightful, others less so. This non-determinism is characteristic of LLM-based systems and requires verification of results regardless of prompt complexity.

We evaluated all three prompting strategies on both annotated and non-annotated versions of the database. Annotations did not substantially increase accuracy. They expanded the agent's exploratory scope instead. With comments present, the agent incorporated variables such as rain duration, sunshine hours, and weekday–weekend distinctions into its analysis. Without annotations, these columns were often ignored, likely because their meaning was less transparent from column names alone.

Conclusion

We have seen that annotations and exact prompting are necessary to retrieve specific information or create ad-hoc analysis that ask for more than facts that are already aggregated and self-explanatory. Issues with these queries can be remediated but require deeper inspections of the results and the queries to catch them. This is where the natural language capabilities end or demand very specific formulations about the data that would have to be acquired prior to the prompts. Even though results were interpreted incorrectly or the generated SQL narrowed the scope of our analysis prematurely, we could leverage it to easily run and adapt our own queries to generate insights about our data.

While our testing proved to yield insightful results, any work with agents is of course a variable matter – the same setup can lead to different results. Nevertheless, MCP holds incredible general potential, which we will discuss in our next and final part in addition to an outlook of its place in the context of data architecture and data technology stacks.

Was this post helpful?

Blog authors

Niklas Niggemann

Working Student Data & AI

Do you still have questions? Just send me a message.

Hendrik Kamp

IT Consultant

Do you still have questions? Just send me a message.

Talk to your Data Part 1: How to generate Insights with MotherDuck MCP...

MotherDuck's new MCP server gives us the opportunity to have a conversation with an AI models like Claude or ChatGPT and ask questions about our data that are directly transformed into SQL. The queries are executed against the actual data in our cloud...

MotherDuck
Data

12.2.2026 | 6 minutes reading time

Niklas Niggemann

Hendrik Kamp

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

In our previous benchmarks, DuckDB consistently outperformed Polars and Pandas on large analytical workloads, but performance comparisons miss a critical question: what happens when you need to move from local DuckDB development to a BigQuery production...

MotherDuck
Data
Big Data
Data Science

10.2.2026 | 6 minutes reading time

Niklas Niggemann

DuckDB vs. Polars: Performance & Memory on Massive Parquet Data

Update 02.02.26 – After helpful insights from the Polars team on LinkedIn, we enhanced our benchmark setup with a configuration of Polars where async is forced. This is elaborated in the article. Our previous benchmark compared DuckDB, Polars, and Pandas...

MotherDuck
Data Science
Data

20.1.2026 | 15 minutes reading time

Niklas Niggemann

MotherDuck: Access Management and Scalable Analytics Overview

MotherDuck's architecture for storage management and user access is built on several key design principles that shape how data is organized and shared. To understand how MotherDuck manages access control, you need to understand three key concepts: organizations...

Data
MotherDuck

8.12.2025 | 6 minutes reading time

Hendrik Kamp

DuckDB vs. DataFrame Libraries

Update 10.12.25 – After helpful insights from Polars Engineer Thijs Nieuwdorp following the initial posting of this article, we were able to refactor our use of the deprecated .count() function in Polars, replacing it with the correct .len() function...

MotherDuck
Data
Data Science
Python
Database

1.12.2025 | 10 minutes reading time

Niklas Niggemann

ODPS: The Standard for Data Products

The data landscape in an organization often looks like this: teams gather and produce data everyday. Each team develops their own metadata models and documentation, if there is any at all. Governance policies exist in scattered documentation (spreadsheets...

Data

7.11.2025 | 4 minutes reading time

DuckDB and MotherDuck for customer facing analytics

MotherDuck
Data

21.10.2025 | 5 minutes reading time

Matthias Niehoff

DuckDB’s friendly SQL is a game changer for developer experience

I don’t think anyone will be surprised when I say that SQL is not the nicest language to work with. Some might even say that it has terrible ergonomics, especially for larger and more complex queries. Still, there are very good reasons why SQL is the...

Data
MotherDuck

14.10.2025 | 12 minutes reading time

Zero-ETL with MotherDuck: A Technical Deep Dive

MotherDuck, the cloud-native service built on DuckDB, fundamentally transforms how organizations interact with data stored in cloud blob storage. By eliminating the traditional ETL/ELT pipeline, MotherDuck enables direct SQL analytics on Parquet, JSON...

MotherDuck
Data

7.10.2025 | 6 minutes reading time

Hendrik Kamp

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

In this post, we'll explore how MotherDuck, powered by DuckDB, revolutionizes the way you interact with your data, particularly when dealing with CSV files. You'll learn how to quickly parse and filter even large datasets directly from your local machine...

Data
Database
MotherDuck
Big Data

30.9.2025 | 8 minutes reading time

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

5 Reasons We’re Excited About MotherDuck’s Launch in AWS Frankfurt For some time, a key challenge for European data teams has been balancing innovation with strict regulation. We’ve often seen powerful tools launch first in the US, while our need for...

Data
Big Data
Database
News
MotherDuck

24.9.2025 | 6 minutes reading time

Marcel Mikl

Using Dagster with DuckDB

DuckDB has rapidly emerged as a popular in-process analytics database. Dagster, on the other hand, is a modern data orchestration framework that makes it easy to build and manage data pipelines. Combining Dagster with DuckDB allows data engineers to ...

Data

16.5.2025 | 4 minutes reading time

Hendrik Kamp

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 minutes reading time

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 minutes reading time

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 minutes reading time

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 minutes reading time

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 minutes reading time

Dr. Florian Rademacher

Talk to your Data Part 2: Limits and Performance Enhancements

Finding Edge Cases and Forcing Hallucinations

Testing with Annotated Data

Exhaustive Prompting vs. Iterative Conversation

Conclusion

Was this post helpful?

Blog authors

More articles in this subject area

Talk to your Data Part 1: How to generate Insights with MotherDuck MCP...

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

DuckDB vs. Polars: Performance & Memory on Massive Parquet Data

MotherDuck: Access Management and Scalable Analytics Overview

DuckDB vs. DataFrame Libraries

ODPS: The Standard for Data Products

DuckDB and MotherDuck for customer facing analytics

DuckDB’s friendly SQL is a game changer for developer experience

Zero-ETL with MotherDuck: A Technical Deep Dive

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

Using Dagster with DuckDB

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products