Building customer-facing analytics, e.g. directly built into the web application they are using, has always been tricky. The requirements are different from in-house BI: users expect sub-second responsiveness, queries can - to a certain degree - be unpredictable and it must be built without overloading operational systems. A few caveats:
- When building directly on your production infrastructure and database, there is a good chance you will overload those systems with the analytical workload. Also, those technologies often do not excel for analytical query patterns, but focus on classic transactional queries.
- Specialised analytical system like warehouse a often heavy-weighted: expensive, slow to provision and focusing on longer running queries
- Also serverless query engines like Athena are quite complex. They require setup, IAM and tuning that are overkill for simple embedded analytics.
- Overall, the client-server model is still challenging. Managing clusters and configurations add operational overhead.
- The user is not interested in any of this. They just want instant and quick UIs, fast and snappy dashboards, which can be easily filtered along different dimensions.
- Especially in SaaS scenarios you need to ensure tenant separation. Not only should each tenant only see their data. Often it is also required they are stored separately
The result: teams either overspend on heavyweight solutions or underdeliver on responsiveness. Often the solutions are built on the application infrastructure and technology, not made for those cases.
How MotherDuck and DuckDB Help
MotherDuck extends DuckDB — the in-process analytical database — into the cloud. Together, they offer a lightweight yet powerful model that is well-suited for customer-facing use cases:
- Lightweight architecture: No heavy OLAP cluster, no complex IAM, no server fleet to manage. DuckDB runs embedded in your app or service, and MotherDuck adds shared, cloud-scale persistence.
- Decoupled storage and compute: Compute is only consumed, when a query is executed. The data itself can be stored in an inexpensive blob-storage - managed by yourself or Motherduck.
- Tenant separation made easy: Next to common SQL-based strategies like separate schemas per tenant, with DuckDB you can go one step further and could easily use one database per tenant. Each database can be one separate file, stored in its own path. In Motherduck you can take this even further and assign every tenant a separate duckling - a compute instance that can be scaled independently. So every tenant gets the compute power it needs.
- Dual query execution: You can query local data (e.g. Parquet, CSV, cached DuckDB tables), data stored in your cloud provider and remote data in MotherDuck’s cloud in the same SQL. Motherduck distributes the query workload, so that it’s “data-local”, meaning the parts of the query are executed where the data is stored, and network transfers are minimized.
The dual query execution can also be used for caching tables. Customer facing analytics is often on a specific dataset, where just filters or aggregations are changed. The underlying dataset keeps the same. With the dual query execution one could easily build a system where the first query reads the data from a cloud source like Motherduck, caches the data locally and uses the cached data for subsequent queries. This uses the DuckDB feature to create a local temporary table.
1CREATE TABLE main.local_cache AS SELECT * FROM ({userQuery})
Example: Local Caching + Cloud Data
Here’s a simple SQL example showing how to combine local caching and remote cloud data with MotherDuck:
1ATTACH 'md:sample_data';
2
3CREATE OR REPLACE TABLE main.local_cache AS
4SELECT *
5FROM
6 (SELECT date_trunc('day', created_date) AS date,
7 agency,
8 count(*) AS request_count
9 FROM sample_data.nyc.service_requests
10 WHERE created_date >= '2021-01-01'
11 AND created_date <= '2021-01-31'
12 GROUP BY ALL
13 ORDER BY 1 ASC);
14
15EXPLAIN
16SELECT *
17FROM main.local_cache;
18
19SELECT agency,
20 sum(request_count)
21FROM main.local_cache
22WHERE date > '2021-01-16'
23GROUP BY ALL
24ORDER BY 2 DESC
What’s Happening Here
We run a query over a small time slice on data that is stored remotely in motherduck. The result is cached locally as a temporary table.
The explain of a select on the cached table shows that it processed locally, indicated by (L) next to the operator name, here the SEQ_SCAN.
Query-Plan für ein einfaches lokales Select
We further narrow down the data, additional group bys, more filtering and different sorting. All running locally.
As DuckDB can be used in all relevant programming languages (python, typescript, java, rust, go, php and more), this can be easily extended to a pattern where a starting data set is load, queries on this data will be executed locally and more data can be loaded from remote.
For the last part one could use the dual execution functionality of Motherduck, making it possible to combine local data and remote data:
1SELECT *
2FROM main.local_cache
3UNION
4SELECT date_trunc('day', created_date) AS date,
5 agency,
6 count(*) AS request_count
7FROM sample_data.nyc.service_requests
8WHERE created_date >= '2021-01-02'
9 AND created_date <= '2021-03-31'
10GROUP BY ALL
Running this query with EXPLAIN will show the Physical Plan. The “Download Source” Operator indicates that a part of the data is downloaded from Motherduck and then unioned with the local data.
Query-Plan für ein SELECT UNION mit lokalen und remote Daten
Takeaways
For customer facing analytics, the balance is always between speed, cost, and simplicity. MotherDuck + DuckDB hit a sweet spot:
- No heavyweight infrastructure to operate.
- No overloaded OLTP databases.
- No complex cluster tuning.
- Just fast, federated analytics, with explicit caching where it matters most.
If you’re building last-mile analytics into your SaaS product or customer portal, you don’t need to choose between an overloaded OLTP backend and an overengineered OLAP system. With MotherDuck and DuckDB, you can cache hot queries locally for instant dashboards and use cloud-compute power for more complex queries. The result: analytics is fast for the enduser, but also simple to build and maintain for the engineers.
More articles
fromMatthias Niehoff
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Matthias Niehoff
Head of Data
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.