DuckDB vs. Polars: Performance & Memory on Massive Parquet Data

20.1.2026 | 14 minutes reading time

Update 02.02.26 – After helpful insights from the Polars team on LinkedIn, we enhanced our benchmark setup with a configuration of Polars where async is forced. This is elaborated in the article.

Our previous benchmark compared DuckDB, Polars, and Pandas on a 9 GB CSV dataset. That's a laptop-scale problem. What happens when analytical workloads reach hundreds of gigabytes or terabytes—scales where memory constraints become critical and file layout decisions matter?

This article stress-tests DuckDB and Polars on datasets up to 2 TB, deliberately exploring adversarial scenarios: very large Parquet files (140 GB each), collections of small files (72 × 2 GB), and combinations of both. Rather than identifying a universal winner, we examine how execution models respond to extreme memory pressure and file organization choices.

By holding the logical query pattern constant and varying only dataset size and physical layout, we isolate the effects of file organization and scale on performance. This enables a focused comparison of execution strategies under heavy, scan-intensive workloads where memory management becomes the critical bottleneck.

Methodology

Benchmark Setup

We use the TPC-H lineitem table, an industry-standard benchmark representing order line items with 16 columns including dates, quantities, prices, and categorical fields. This provides a reproducible baseline with realistic schema complexity and data distributions. Individual Parquet files range in size from approximately 2 GB to 140 GB.

All queries follow the same logical pattern: a Parquet scan with projection to three columns, selective filters, and a final row count. The workload is scan-heavy and filter-and-aggregate oriented. It does not evaluate join-heavy or shuffle-intensive queries. Both systems use their default multithreaded execution settings. Filter selectivity was kept constant across all experiments to ensure comparability.

Experimental Environment

All tests were performed on a 2021 MacBook Pro (M1 Max, 32 GB RAM) using Python. We developed a custom benchmarking tool to ensure consistent and reproducible command-line execution, allowing users to specify both the engine and the operation under test. Results were visualized using matplotlib, and key statistical metrics were reported.

Memory Measurement

Initial experiments measured memory usage as the difference in resident set size (RSS) between the start and end of each query. This approach misled us because peak memory usage often occurs mid-execution, temporary buffers may be freed before completion, and operating system caching can distort final RSS measurements.

In this benchmark, memory usage is defined as the maximum RSS observed during query execution, measured relative to the starting RSS. This captures the peak incremental working set required by the query rather than the total process footprint.

We sampled process memory at short intervals of 0.05 seconds and recorded the highest observed value. Each test executed in a fresh process to ensure a consistent baseline. While operating system page caching could not be fully eliminated, this setup reduces persistent in-process caching effects. Every configuration ran ten times, and reported results represent average values.

Memory Measurement Nuances: The mmap Effect

Polars uses memory-mapped I/O (mmap) by default for Parquet files on local storage. With mmap, the operating system loads file pages into memory on-demand (lazy loading) and can reclaim them if other processes need memory. This appears as high memory usage in process monitoring tools but differs from permanently allocated memory that cannot be reclaimed.

Our memory measurements capture resident set size (RSS), which includes mmap-backed pages. This represents memory occupied by the process but not necessarily owned exclusively—the OS can evict these pages if needed. For benchmarking purposes, we report peak RSS as this reflects real memory pressure on the system, but we note when mmap significantly contributes to observed usage.

Dataset Generation

The dataset used in this benchmark was generated using DuckDB’s tpch extension, which implements both the data generator and the reference queries for the TPC-H benchmark. The extension is shipped by default with most DuckDB builds and supports easy data generation via scale factors. From the generated dataset, the lineitem table was extracted and exported as a Parquet file using DuckDB.

Test Design

Three experiments isolate different scaling dimensions:

Single-table scaling (2 GB → 140 GB): Tests how engines handle increasing data volume in a single file. This isolates memory management and scanning strategies without file-count effects.
Many small files (up to 72 × 2 GB): Tests file I/O overhead and parallel scan coordination. Real-world data lakes often contain thousands of partitioned files; this scenario evaluates how engines handle file metadata and concurrent reads.
Few very large files (multiple 140 GB files → 2 TB): Combines both challenges—extreme file sizes and multi-file coordination. This represents worst-case scenarios for memory-constrained environments.

In the latter two tests, UNION ALL ensures all input data is processed without deduplication and prevents metadata-only row-count optimizations.

Context: Building on Previous Benchmarks

Our previous article compared these engines on a 9 GB CSV dataset. Key findings:

DuckDB: ~300 MB peak memory, fastest execution
Polars (lazy + streaming): ~0.5 GB peak memory, competitive speed
Pandas: ~10 GB peak memory, slowest execution

Those results suggested Polars could match DuckDB's efficiency at modest scales. This stress test examines whether that parity holds at 10-100× larger scales and with different file formats (Parquet vs CSV).

Results

Across all experiments, the logical query remained identical while the physical layout and total dataset size varied from a few gigabytes to nearly 2 TB. This isolates the effects of file size, file count, and execution strategy on runtime and incremental peak memory.

1. Single-Table Scaling

When a single Parquet file grows from approximately 2 GB to 140 GB, Polars and DuckDB exhibit very similar execution times, with DuckDB maintaining a slight advantage of roughly one second at the largest scale.

DuckDB shows a slow and steady increase in memory usage as data volume grows, peaking at around 1.3 GB. In contrast, Polars' memory usage increases much more aggressively, surpassing 1 GB at roughly 8 GB of data and reaching approximately 17 GB for the ~140 GB dataset.

This 13× memory difference matters critically in memory-constrained environments. On a 32 GB machine processing the 140 GB dataset, DuckDB leaves ~30 GB free for the OS and other processes, while Polars consumes over half of available RAM.

This result surprised us, particularly given Polars' strong performance in our previous 9 GB benchmark. We investigated further with help from the Polars team and gained additional insight into Polars' internal mechanisms.

Polars' high observed memory consumption stems primarily from mmap page loading. On systems with less available RAM, fewer pages would be loaded, and the apparent memory usage would be lower. Importantly, this memory is not permanently owned by the engine; if the operating system requires it, these pages can be reclaimed.

In typical scenarios, this behavior is entirely reasonable. However, for our specific benchmarking setup, Polars can be configured to behave more similarly to DuckDB. By setting the environment variable POLARS_FORCE_ASYNC=1, Polars switches its default DynByteSource from mmap to ObjectStore for Parquet and IPC readers. This causes Polars to eagerly read bytes into memory instead of relying on lazy page faults via mmap. These settings are optimized for object storage and have not yet been fully optimized for local storage, though the performance impact is expected to be limited.

Adding Polars with forced async to the benchmark shows that this configuration is noticeably slower than both DuckDB and default Polars, requiring nearly three additional seconds for the largest workload.

This slower execution time trades performance for improved memory behavior. With forced async enabled, Polars’ memory usage remains comparatively low and grows only modestly with larger workloads—roughly 350 MB for the smallest dataset and around 750 MB for the largest. This contrasts sharply with the dramatic increase observed under the default Polars configuration.

2. Many Small Files

Combining up to 72 Parquet files of approximately 2 GB each results in similar execution times to the single-file case, with DuckDB showing a slightly more pronounced advantage. Interestingly, there is no substantial execution-time difference between default Polars and Polars with forced async in this case.

DuckDB’s memory usage remains consistently below 70 MB, increasing only marginally as data volume grows. Polars, by contrast, shows a sharp increase in memory consumption—from about 400 MB at 2 GB to roughly 4.3 GB at 40 GB—after which usage largely plateaus. Similar to execution time, both Polars configurations behave more similarly in this test than in the others, scaling in a comparable manner and crossing over at around 100 GB of data, where the default Polars setup becomes slightly more memory-efficient than the async variant.

3. Few Very Large Files

Execution-time trends in this scenario are similar to those observed in the second test. For the ~2 TB dataset, Polars completes the workload in approximately one minute, while DuckDB finishes in roughly 45 seconds. Polars with forced async again requires significantly more time, approaching 100 seconds for the largest workload.

Polars’ memory usage scales from about 1.7 GB to 23 GB across the tested volumes. DuckDB, in contrast, increases from approximately 500 MB to 2.4 GB, maintaining a significantly lower peak memory footprint throughout. With forced async enabled, Polars stays closer to DuckDB, showing a difference of roughly 200 MB for the smallest workload and around 5 GB for the largest. For the largest dataset, forced async reduces Polars’ peak memory usage by up to 2.8×.

4. File Layout Impact: Why Partitioning Matters

Comparing identical data volumes across different file layouts reveals a critical finding: partitioning dramatically reduces memory usage for both engines. We compare the single-table scenario with the many-small-files scenario—both represent similar total data volumes but differ in file layout.

Aside from Polars with forced async, no engine shows significant execution-time improvement from partitioning data into smaller files.

Both engines benefit substantially in terms of memory usage. With partitioned workloads, DuckDB reduces its peak memory consumption by up to 8×, while Polars achieves reductions of up to 4×. Polars with forced async performs noticeably better on single large files—even approaching DuckDB's memory efficiency—than when processing many smaller files.

Practical Implications

For the same ~140 GB of data:

DuckDB: 1.3 GB peak (single file) → 160 MB peak (72 small files) — 8× reduction
Polars: 17 GB peak (single file) → 4.3 GB peak (72 small files) — 4× reduction

File organization is not just a performance concern—it's a memory safety issue. Data engineering teams working with large datasets should consider partitioning strategies even when distributed processing isn't required. A monolithic 140 GB Parquet file might work fine in DuckDB but could cause OOM failures with Polars on typical developer laptops.

Discussion

Memory Behavior

Across all experimental configurations, DuckDB’s incremental peak memory usage remained below approximately 2.5 GB, even when processing nearly 2 TB of Parquet data. In contrast, Polars exhibited substantially higher peak memory consumption on very large input files, reaching around 20 GB in some cases. When the same data was partitioned into many smaller files, the incremental peak memory usage of both engines decreased significantly.

These results align with DuckDB’s documentation, which recommends Parquet file sizes between 100 MB and 10 GB. Interestingly, Polars with forced async not only failed to benefit from partitioning in this setup, but actually lost its competitive edge relative to DuckDB.

Polars and Large Files

While Polars achieves competitive execution times, its peak memory usage increases rapidly for large files. The same workload becomes significantly more memory-efficient when we split it into many smaller files, suggesting that Polars' bottleneck lies in row-group and page-level buffering combined with multithreaded scanning and decompression.

In Arrow-style Parquet reading, decompression occurs at the page and column-chunk level, and buffers can be sizable. When we partition workloads into smaller files, fewer row groups are processed concurrently, reducing the worst-case sum of decode, mask, and materialization buffers. Switching to forced async can mitigate this behavior by trading execution speed for reduced memory pressure.

DuckDB’s architecture, by contrast, enforces strict memory limits through its buffer manager. Although DuckDB also decodes Parquet pages, its execution model is designed to keep the working set bounded and to stream or evict data aggressively, preventing large transient memory spikes.

Peak vs. Final Memory Usage

Polars allocates substantial transient memory during query execution—primarily for scanning, decompression, and intermediate buffers—but releases most of it afterward, resulting in a post-execution footprint close to baseline. DuckDB maintains a tighter bound on memory usage throughout execution. In memory-constrained environments, peak usage is more critical than final usage, as it determines the risk of out-of-memory failures during execution. This behavior is largely attributable to mmap.

Polars with forced async behaves more like DuckDB in this respect, as it relies on traditional read/write I/O, which is better suited for streaming workloads.

Row Group Size

Row group size plays a critical role in DuckDB's performance. DuckDB parallelizes Parquet scans at the row-group level; consequently, files with a single large row group can only be processed by a single thread. DuckDB recommends row groups of 100 K–1 M rows and warns that sizes below 5,000 rows can degrade performance by 5–10×.

Polars' default row group size is approximately 250 K rows. To account for this, we explicitly generated Parquet files with a row group size of 300,000 rows.

Using optimized row groups significantly improved execution time for both engines:

Impact of Row Group Size Optimization (2 TB dataset):

Configuration	Polars	DuckDB
Small row groups (default)	170s	70s
Optimized row groups (300K)	60s	44s
Improvement	2.8× faster	1.6× faster

DuckDB's memory usage also dropped substantially, while Polars' memory usage remained relatively stable. This demonstrates that even with engine-agnostic formats like Parquet, physical layout decisions implicitly favor certain execution engines. Row group sizing is not just a tuning parameter—it's a first-class performance determinant.

Memory Usage Does Not Equal "Memory Usage"

Because Polars relies heavily on mmap by default, it is not entirely accurate to attribute observed memory usage solely to the engine itself—the operating system accounts for much of it. From a user's perspective, however, this distinction is largely academic: if a process occupies a given amount of memory and that becomes a system constraint, it makes little difference where the memory is technically allocated.

Polars' use of mmap is not a flaw, nor an argument against the engine. Both Polars and mmap are industry-standard tools used for good reason. Our benchmark simply exposes this design choice as a disadvantage in a very specific scenario. As always, benchmarks represent only one piece of a much larger puzzle and should be interpreted with care.

Conclusion

DuckDB and Polars both deliver strong performance on terabyte-scale analytical workloads, but they offer different configurations with distinct tradeoffs between memory efficiency and throughput.

DuckDB prioritizes memory efficiency. Its buffer manager enforces strict limits, keeping peak memory below 2.5 GB even when processing nearly 2 TB of data. This makes DuckDB the safer choice for memory-constrained environments, unpredictable workloads, and production ETL pipelines where out-of-memory failures are unacceptable.

Polars (default) prioritizes throughput. Its mmap strategy leverages OS page caching for fast reads but trades memory footprint for speed. On the 140 GB single-file test, default Polars consumed 17 GB peak memory but maintained competitive execution times. When data is well-partitioned into smaller files, memory usage drops to 4.3 GB—still higher than DuckDB but acceptable when memory headroom is available. This configuration makes Polars ideal for DataFrame-oriented workflows on well-partitioned datasets.

Polars with forced async offers a third option. By setting POLARS_FORCE_ASYNC=1, Polars switches from mmap to traditional I/O, achieving memory usage comparable to DuckDB (750 MB for 140 GB files). However, this trades speed for memory efficiency—the 2 TB workload takes 100 seconds versus 60 seconds for default Polars and 44 seconds for DuckDB. Counterintuitively, forced async performs better on single large files than on many small files, making it suitable for scenarios where large monolithic files cannot be repartitioned and memory constraints are tight.

The configuration choice depends on your constraints:

Configuration	Peak Memory (140 GB file)	Execution Speed	Best For
DuckDB	1.3 GB	Fastest	Memory-constrained (16-32 GB), production stability
Polars (default)	17 GB → 4.3 GB (partitioned)	Fast	well-partitioned data
Polars (forced async)	750 MB	Slowest	Memory-constrained with large files, cannot repartition

One caveat with this observation has to be kept in mind: the memory used by Polars in the default configuration is dependent on the available memory of the system it's running on. If there is lots of free memory to use–like in our case–, mmap will use that memory. If there is only a limited amount of memory, mmap will use significantly less memory. The forced async configuration demonstrated that the engine itself only needs around 750 MB, excess memory is used up by mmap and thus varies based on the available free memory.

The most surprising finding: file layout matters more than engine choice for memory usage. Partitioning the same 140 GB dataset into 72 smaller files reduced DuckDB's memory by 8× and default Polars' by 4×. Data engineers should treat Parquet file organization as a first-class architectural decision, not just a storage detail.

Although Parquet is an open, engine-agnostic format, physical layout decisions—file size, row group size, compression—meaningfully affect execution behavior. The abstraction is leakier than commonly assumed. In practice, storage and compute remain tightly coupled.

If you are interested in leveraging DuckDB's capabilities in the cloud, consider enrolling in our on-demand Hands-on Workshop: Introduction to MotherDuck for a complete practical walkthrough.

Was this post helpful?

Blog author

Niklas Niggemann

Working Student Data & AI

Do you still have questions? Just send me a message.

Narwhals: Building Dataframe-Agnostic Libraries with Zero Dependencies

After the publication of our article about Ibis, Dr André Schemaitat pointed us to a similar tool with growing popularity – Narwhals. Narwhals describes itself as an "extremely lightweight and extensible compatibility layer between dataframe libraries...

Data
Python
Software development

3.3.2026 | 11 minutes reading time

Niklas Niggemann

Talk to your Data Part 3: The Potential of Natural Language

This is the last and final part of our article series covering the new MCP server by MotherDuck. We have already presented the basics and challenges in previous parts. Now, we want to conclude with our findings and comments on the current state and give...

MotherDuck
Data

27.2.2026 | 7 minutes reading time

Hendrik Kamp

Niklas Niggemann

Talk to your Data Part 2: Limits and Performance Enhancements

In part one of this series, we introduced the MotherDuck MCP server in combination with opencode and showcased initial context engineering. We also showed deeper knowledge retrieval using natural language instead of SQL. In this article we will dive ...

MotherDuck
Data

19.2.2026 | 8 minutes reading time

Niklas Niggemann

Hendrik Kamp

Talk to your Data Part 1: How to generate Insights with MotherDuck MCP...

MotherDuck's new MCP server gives us the opportunity to have a conversation with an AI models like Claude or ChatGPT and ask questions about our data that are directly transformed into SQL. The queries are executed against the actual data in our cloud...

MotherDuck
Data

12.2.2026 | 6 minutes reading time

Niklas Niggemann

Hendrik Kamp

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

In our previous benchmarks, DuckDB consistently outperformed Polars and Pandas on large analytical workloads, but performance comparisons miss a critical question: what happens when you need to move from local DuckDB development to a BigQuery production...

MotherDuck
Data
Big Data
Data Science

10.2.2026 | 6 minutes reading time

Niklas Niggemann

MotherDuck: Access Management and Scalable Analytics Overview

MotherDuck's architecture for storage management and user access is built on several key design principles that shape how data is organized and shared. To understand how MotherDuck manages access control, you need to understand three key concepts: organizations...

Data
MotherDuck

8.12.2025 | 6 minutes reading time

Hendrik Kamp

DuckDB vs. DataFrame Libraries

Update 10.12.25 – After helpful insights from Polars Engineer Thijs Nieuwdorp following the initial posting of this article, we were able to refactor our use of the deprecated .count() function in Polars, replacing it with the correct .len() function...

MotherDuck
Data
Data Science
Python
Database

1.12.2025 | 10 minutes reading time

Niklas Niggemann

ODPS: The Standard for Data Products

The data landscape in an organization often looks like this: teams gather and produce data everyday. Each team develops their own metadata models and documentation, if there is any at all. Governance policies exist in scattered documentation (spreadsheets...

Data

7.11.2025 | 4 minutes reading time

DuckDB and MotherDuck for customer facing analytics

MotherDuck
Data

21.10.2025 | 5 minutes reading time

Matthias Niehoff

DuckDB’s friendly SQL is a game changer for developer experience

I don’t think anyone will be surprised when I say that SQL is not the nicest language to work with. Some might even say that it has terrible ergonomics, especially for larger and more complex queries. Still, there are very good reasons why SQL is the...

Data
MotherDuck

14.10.2025 | 12 minutes reading time

Zero-ETL with MotherDuck: A Technical Deep Dive

MotherDuck, the cloud-native service built on DuckDB, fundamentally transforms how organizations interact with data stored in cloud blob storage. By eliminating the traditional ETL/ELT pipeline, MotherDuck enables direct SQL analytics on Parquet, JSON...

MotherDuck
Data

7.10.2025 | 6 minutes reading time

Hendrik Kamp

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

In this post, we'll explore how MotherDuck, powered by DuckDB, revolutionizes the way you interact with your data, particularly when dealing with CSV files. You'll learn how to quickly parse and filter even large datasets directly from your local machine...

Data
Database
MotherDuck
Big Data

30.9.2025 | 8 minutes reading time

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

5 Reasons We’re Excited About MotherDuck’s Launch in AWS Frankfurt For some time, a key challenge for European data teams has been balancing innovation with strict regulation. We’ve often seen powerful tools launch first in the US, while our need for...

Data
Big Data
Database
News
MotherDuck

24.9.2025 | 6 minutes reading time

Marcel Mikl

Using Dagster with DuckDB

DuckDB has rapidly emerged as a popular in-process analytics database. Dagster, on the other hand, is a modern data orchestration framework that makes it easy to build and manage data pipelines. Combining Dagster with DuckDB allows data engineers to ...

Data

16.5.2025 | 4 minutes reading time

Hendrik Kamp

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 minutes reading time

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 minutes reading time

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 minutes reading time

Daniel Kocot

DuckDB vs. Polars: Performance & Memory on Massive Parquet Data

Methodology

Benchmark Setup

Experimental Environment

Memory Measurement

Dataset Generation

Test Design

Context: Building on Previous Benchmarks

Results

1. Single-Table Scaling

2. Many Small Files

3. Few Very Large Files

4. File Layout Impact: Why Partitioning Matters

Discussion

Memory Behavior

Polars and Large Files

Peak vs. Final Memory Usage

Row Group Size

Memory Usage Does Not Equal "Memory Usage"

Conclusion

Was this post helpful?

Blog author

More articles in this subject area

Narwhals: Building Dataframe-Agnostic Libraries with Zero Dependencies

Talk to your Data Part 3: The Potential of Natural Language

Talk to your Data Part 2: Limits and Performance Enhancements

Talk to your Data Part 1: How to generate Insights with MotherDuck MCP...

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

MotherDuck: Access Management and Scalable Analytics Overview

DuckDB vs. DataFrame Libraries

ODPS: The Standard for Data Products

DuckDB and MotherDuck for customer facing analytics

DuckDB’s friendly SQL is a game changer for developer experience

Zero-ETL with MotherDuck: A Technical Deep Dive

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

Using Dagster with DuckDB

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...