DuckDB and MotherDuck for customer facing analytics

21.10.2025 | 5 minutes reading time

Building customer-facing analytics, e.g. directly built into the web application they are using, has always been tricky. The requirements are different from in-house BI: users expect sub-second responsiveness, queries can - to a certain degree - be unpredictable and it must be built without overloading operational systems. A few caveats:

When building directly on your production infrastructure and database, there is a good chance you will overload those systems with the analytical workload. Also, those technologies often do not excel for analytical query patterns, but focus on classic transactional queries.
Specialised analytical system like warehouse a often heavy-weighted: expensive, slow to provision and focusing on longer running queries
Also serverless query engines like Athena are quite complex. They require setup, IAM and tuning that are overkill for simple embedded analytics.
Overall, the client-server model is still challenging. Managing clusters and configurations add operational overhead.
The user is not interested in any of this. They just want instant and quick UIs, fast and snappy dashboards, which can be easily filtered along different dimensions.
Especially in SaaS scenarios you need to ensure tenant separation. Not only should each tenant only see their data. Often it is also required they are stored separately

The result: teams either overspend on heavyweight solutions or underdeliver on responsiveness. Often the solutions are built on the application infrastructure and technology, not made for those cases.

How MotherDuck and DuckDB Help

MotherDuck extends DuckDB — the in-process analytical database — into the cloud. Together, they offer a lightweight yet powerful model that is well-suited for customer-facing use cases:

Lightweight architecture: No heavy OLAP cluster, no complex IAM, no server fleet to manage. DuckDB runs embedded in your app or service, and MotherDuck adds shared, cloud-scale persistence.
Decoupled storage and compute: Compute is only consumed, when a query is executed. The data itself can be stored in an inexpensive blob-storage - managed by yourself or Motherduck.
Tenant separation made easy: Next to common SQL-based strategies like separate schemas per tenant, with DuckDB you can go one step further and could easily use one database per tenant. Each database can be one separate file, stored in its own path. In Motherduck you can take this even further and assign every tenant a separate duckling - a compute instance that can be scaled independently. So every tenant gets the compute power it needs.
Dual query execution: You can query local data (e.g. Parquet, CSV, cached DuckDB tables), data stored in your cloud provider and remote data in MotherDuck’s cloud in the same SQL. Motherduck distributes the query workload, so that it’s “data-local”, meaning the parts of the query are executed where the data is stored, and network transfers are minimized.

The dual query execution can also be used for caching tables. Customer facing analytics is often on a specific dataset, where just filters or aggregations are changed. The underlying dataset keeps the same. With the dual query execution one could easily build a system where the first query reads the data from a cloud source like Motherduck, caches the data locally and uses the cached data for subsequent queries. This uses the DuckDB feature to create a local temporary table.

1CREATE TABLE main.local_cache AS SELECT * FROM ({userQuery})

Example: Local Caching + Cloud Data

Here’s a simple SQL example showing how to combine local caching and remote cloud data with MotherDuck:

1ATTACH 'md:sample_data';
2
3CREATE OR REPLACE TABLE main.local_cache AS
4SELECT *
5FROM
6  (SELECT date_trunc('day', created_date) AS date,
7          agency,
8          count(*) AS request_count
9   FROM sample_data.nyc.service_requests
10   WHERE created_date >= '2021-01-01'
11     AND created_date <= '2021-01-31'
12   GROUP BY ALL
13   ORDER BY 1 ASC);
14
15EXPLAIN
16SELECT *
17FROM main.local_cache;
18
19SELECT agency,
20       sum(request_count)
21FROM main.local_cache
22WHERE date > '2021-01-16'
23GROUP BY ALL
24ORDER BY 2 DESC

What’s Happening Here

We run a query over a small time slice on data that is stored remotely in motherduck. The result is cached locally as a temporary table.
The explain of a select on the cached table shows that it processed locally, indicated by (L) next to the operator name, here the SEQ_SCAN. Query-Plan für ein einfaches lokales Select
We further narrow down the data, additional group bys, more filtering and different sorting. All running locally.

As DuckDB can be used in all relevant programming languages (python, typescript, java, rust, go, php and more), this can be easily extended to a pattern where a starting data set is load, queries on this data will be executed locally and more data can be loaded from remote.

For the last part one could use the dual execution functionality of Motherduck, making it possible to combine local data and remote data:

1SELECT *
2FROM main.local_cache
3UNION
4SELECT date_trunc('day', created_date) AS date,
5       agency,
6       count(*) AS request_count
7FROM sample_data.nyc.service_requests
8WHERE created_date >= '2021-01-02'
9  AND created_date <= '2021-03-31'
10GROUP BY ALL

Running this query with EXPLAIN will show the Physical Plan. The “Download Source” Operator indicates that a part of the data is downloaded from Motherduck and then unioned with the local data. Query-Plan für ein SELECT UNION mit lokalen und remote Daten

Takeaways

For customer facing analytics, the balance is always between speed, cost, and simplicity. MotherDuck + DuckDB hit a sweet spot:

No heavyweight infrastructure to operate.
No overloaded OLTP databases.
No complex cluster tuning.
Just fast, federated analytics, with explicit caching where it matters most.

If you’re building last-mile analytics into your SaaS product or customer portal, you don’t need to choose between an overloaded OLTP backend and an overengineered OLAP system. With MotherDuck and DuckDB, you can cache hot queries locally for instant dashboards and use cloud-compute power for more complex queries. The result: analytics is fast for the enduser, but also simple to build and maintain for the engineers.

Was this post helpful?

Blog author

Matthias Niehoff

Head of Data

Do you still have questions? Just send me a message.

fromMatthias Niehoff

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 4 minutes reading time

Matthias Niehoff

Lookup additional data in Spark Streaming

When processing streaming data, the raw data from the events are often not sufficient. Additional data must be added in most cases, for example metadata for a sensor, of which only the ID is sent in the event. In this blog post I would like to discuss...

Software architecture
Scala
Big Data
Data
Streaming

1.6.2017 | 7 minutes reading time

Matthias Niehoff

Event time processing in Apache Spark and Apache Flink

With the new release of Spark 2.1, the event-time capabilities of Spark Structured Streaming have been expanded. It is time to take a closer look at the state of support and compare it with Apache Flink – which comes with a broad support for event time...

Big Data
Data
Machine Learning
Streaming

19.4.2017 | 9 minutes reading time

Matthias Niehoff

Distributed Stream Processing Frameworks for Fast & Big Data

Spark Streaming, Flink, Storm, Kafka Streams – that are only the most popular candidates of an ever growing range of frameworks for processing streaming data at high scale. This article is about the main concepts behind these frameworks. Furthermore...

Big Data
Data
Open Source
Messaging
Machine Learning
Streaming

26.3.2017 | 10 minutes reading time

Matthias Niehoff

DuckDB vs. DataFrame Libraries

Efficient processing of large, structured datasets is central to modern data analysis. Pandas has long been Python’s default DataFrame library, valued for its flexibility, rich ecosystem, and intuitive API. As datasets grow beyond memory and performance...

MotherDuck
Data
Data Science
Python
Database

1.12.2025 | 9 minutes reading time

Niklas Niggemann

ODPS: The Standard for Data Products

The data landscape in an organization often looks like this: teams gather and produce data everyday. Each team develops their own metadata models and documentation, if there is any at all. Governance policies exist in scattered documentation (spreadsheets...

Data

7.11.2025 | 4 minutes reading time

DuckDB’s friendly SQL is a game changer for developer experience

I don’t think anyone will be surprised when I say that SQL is not the nicest language to work with. Some might even say that it has terrible ergonomics, especially for larger and more complex queries. Still, there are very good reasons why SQL is the...

Data
MotherDuck

14.10.2025 | 12 minutes reading time

Zero-ETL with MotherDuck: A Technical Deep Dive

MotherDuck, the cloud-native service built on DuckDB, fundamentally transforms how organizations interact with data stored in cloud blob storage. By eliminating the traditional ETL/ELT pipeline, MotherDuck enables direct SQL analytics on Parquet, JSON...

MotherDuck
Data

7.10.2025 | 6 minutes reading time

Hendrik Kamp

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

In this post, we'll explore how MotherDuck, powered by DuckDB, revolutionizes the way you interact with your data, particularly when dealing with CSV files. You'll learn how to quickly parse and filter even large datasets directly from your local machine...

Data
Database
MotherDuck
Big Data

30.9.2025 | 8 minutes reading time

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

5 Reasons We’re Excited About MotherDuck’s Launch in AWS Frankfurt For some time, a key challenge for European data teams has been balancing innovation with strict regulation. We’ve often seen powerful tools launch first in the US, while our need for...

Data
Big Data
Database
News
MotherDuck

24.9.2025 | 6 minutes reading time

Marcel Mikl

Using Dagster with DuckDB

DuckDB has rapidly emerged as a popular in-process analytics database. Dagster, on the other hand, is a modern data orchestration framework that makes it easy to build and manage data pipelines. Combining Dagster with DuckDB allows data engineers to ...

Data

16.5.2025 | 4 minutes reading time

Hendrik Kamp

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 minutes reading time

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Data

20.1.2025 | 5 minutes reading time

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 minutes reading time

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 minutes reading time

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 minutes reading time

Dr. Florian Rademacher

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 minutes reading time

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 minutes reading time

Daniel Kocot

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

In the ever-evolving landscape of software development, buzzwords and paradigms come and go. One such term that has gained significant traction in recent years is "API-First Development." It's been hailed as the holy grail of modern software engineering...

API
Data

19.10.2023 | 5 minutes reading time

Daniel Kocot

An introduction to federated learning in an industrial context: Advanced

In the Machine Learning space, it was long believed that sharing learnings or weights was safe in the sense that the input data couldn't be extracted. However, this belief has been challenged by researchers coming out over the years. Nowadays, numerous...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 9 minutes reading time

DuckDB and MotherDuck for customer facing analytics

How MotherDuck and DuckDB Help

Example: Local Caching + Cloud Data

Takeaways

Was this post helpful?

Blog author

More articles

Access Databricks UnityCatalog from duckdb

Lookup additional data in Spark Streaming

Event time processing in Apache Spark and Apache Flink

Distributed Stream Processing Frameworks for Fast & Big Data

More articles in this subject area

DuckDB vs. DataFrame Libraries

ODPS: The Standard for Data Products

DuckDB’s friendly SQL is a game changer for developer experience

Zero-ETL with MotherDuck: A Technical Deep Dive

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

Using Dagster with DuckDB

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

An introduction to federated learning in an industrial context: Advanced