Our last comparison of DuckDB, Polars, and Pandas has primarily focused on modest dataset sizes with 9 GB CSV benchmarks. The present article extends this evaluation to substantially larger scales, concentrating on DuckDB and Polars under extreme analytical workloads derived from the TPC-H benchmark.
Rather than emphasizing idealized storage layouts or moderate data volumes, this work deliberately explores adversarial scenarios, including very large Parquet files, large collections of small files, and combinations of both. The objective is not to identify a universally superior system, but to examine how different execution models respond to increasing data volume, file counts, and memory pressure.
By holding the logical query pattern constant and varying only dataset size and physical layout, this study isolates the effects of file organization and scale on performance. This enables a focused comparison of execution strategies under heavy, scan-intensive workloads.
Methodology
Benchmark Setup
The benchmark uses the TPC-H lineitem table stored in Parquet format. Individual files range in size from approximately 2 GB to 140 GB. All queries follow the same logical pattern: a Parquet scan with projection to three columns, selective filters, and a final row count.
The workload is therefore scan-heavy and filter-and-aggregate oriented. It does not evaluate join-heavy or shuffle-intensive queries. Both systems were allowed to use their default multithreaded execution settings. Filter selectivity was kept constant across all experiments to ensure comparability.
Experimental Environment
All tests were performed on a 2021 MacBook Pro (M1 Max, 32 GB RAM) using Python. A custom benchmarking tool was developed to ensure consistent and reproducible command-line execution, allowing users to specify both the engine and the operation under test. Results were visualized using matplotlib, and key statistical metrics were reported.
Memory Measurement
Initial experiments measured memory usage as the difference in resident set size (RSS) between the start and end of each query. This approach proved misleading because peak memory usage often occurs mid-execution, temporary buffers may be freed before completion, and operating system caching can distort final RSS measurements.
In this benchmark, memory usage is therefore defined as the maximum RSS observed during query execution, measured relative to the starting RSS. This captures the peak incremental working set required by the query rather than the total process footprint.
Process memory was sampled at short intervals of 0.05 seconds, and the highest observed value was recorded. Each test was executed in a fresh process to ensure a consistent baseline. While operating system page caching could not be fully eliminated, this setup reduces persistent in-process caching effects. Every configuration was run ten times, and reported results represent average values.
Dataset Generation
The dataset used in this benchmark was generated using DuckDB’s tpch extension, which implements both the data generator and the reference queries for the TPC-H benchmark. The extension is shipped by default with most DuckDB builds and supports easy data generation via scale factors. From the generated dataset, the lineitem table was extracted and exported as a Parquet file using DuckDB.
Test Design
Three experiments were conducted to capture different scaling behaviors:
Single-Table Scaling: A single
lineitemtable grows from approximately 2 GB to 140 GB.Many Small Files: Up to 72 files of approximately 2 GB each are combined, matching the total data volume of the single-table test while substantially increasing the file count.
Few Very Large Files: Multiple 140 GB files are combined, scaling to approximately 1.97 TB in total.
In both of the latter tests, UNION ALL ensures that all input data is processed without deduplication and prevents metadata-only row-count optimizations.
Results
Across all experiments, the logical query remained identical while the physical layout and total dataset size varied from a few gigabytes to nearly 2 TB. This isolates the effects of file size, file count, and execution strategy on runtime and incremental peak memory.
1. Single-Table Scaling
When a single Parquet file grows from approximately 2 GB to 140 GB, Polars and DuckDB exhibit very similar execution times, with Polars maintaining a slight advantage of roughly 1 second at the largest scale.
DuckDB shows a slow and steady increase in memory usage as data volume grows, peaking at around 1.3 GB. In contrast, Polars’ memory usage increases much more aggressively, surpassing 1 GB already at roughly 8 GB of data and reaching approximately 17 GB for the ~140 GB dataset.
2. Many Small Files
Combining up to 72 Parquet files of approximately 2 GB each to match the workload of the first test results in similar execution times, with a slightly more pronounced advantage for DuckDB.
DuckDB’s memory usage remains consistently below 70 MB with only a minor increase as data volume grows. Polars, on the other hand, shows a sharp rise in memory consumption—from about 400 MB for 2 GB to 4.3 GB for 40 GB—after which it roughly plateaus for the remaining datasets.
3. Few Very Large Files
The execution-time trend in this scenario is similar to the second test. For the 2 TB dataset, Polars reaches approximately 1 minute of runtime, while DuckDB completes the workload in roughly 45 seconds.
Polars’ memory usage scales from about 1.7 GB to 23 GB across the tested volumes. DuckDB, in contrast, increases from approximately 500 MB to 2.4 GB, maintaining a significantly lower peak memory footprint throughout.
4. File Layout Comparison
In addition to the previous tests, we directly compare the results of the single-table scenario with the many-small-files scenario, as both represent similar total data volumes but differ in file layout. Neither engine shows a significant execution-time improvement from partitioning the data into smaller files.
In contrast, both engines benefit substantially in terms of memory usage. With partitioned workloads, DuckDB reduces its peak memory consumption by up to 8x, while Polars achieves reductions of up to 4x.
Discussion
Memory Behavior
Across all experimental configurations, DuckDB’s incremental peak memory usage remained below approximately 2.5 GB, even when processing nearly 2 TB of Parquet data. In contrast, Polars exhibited substantially higher peak memory consumption on very large input files, reaching around 20 GB in some cases. When the same data was partitioned into many smaller files, both Polars’ and DuckDB’s incremental peak memory decreased. These results suggest that both engines benefit greatly from partitioned workloads compared to single, massive datasets. This aligns with DuckDB’s documentation, which recommends Parquet file sizes between 100 MB and 10 GB.
Polars and Large Files
While Polars achieves competitive execution times, its peak memory usage increases rapidly for large files. The same workload becomes significantly more memory-efficient when split into many smaller files. This suggests that Polars’ bottleneck lies in row-group and page-level buffering combined with multi-threaded scanning and decompression. With many threads, multiple row groups may be decompressed concurrently, each requiring its own scratch buffers.
In Arrow-style Parquet reading, decompression occurs at the page and column-chunk level, and buffers can be sizable. When workloads are partitioned into smaller files, the number of row groups processed concurrently is reduced, lowering the worst-case sum of decode, mask, and materialization buffers.
DuckDB’s architecture, by contrast, enforces strict memory limits through its buffer manager. Although DuckDB also decodes Parquet pages, its execution model is designed to keep the working set bounded and to evict or stream data aggressively, preventing large transient memory spikes.
Peak vs. Final Memory Usage
Polars allocates substantial transient memory during query execution—primarily for scanning, decompression, and intermediate buffers—but releases most of it afterward, resulting in a post-execution footprint close to the baseline. DuckDB maintains a tighter bound on memory usage throughout execution. For memory-constrained environments, peak usage is therefore more critical than post-execution consumption, as it determines the risk of out-of-memory failures during query execution.
Row Group Size
Row group size plays a critical role in DuckDB’s performance. DuckDB parallelizes Parquet scans at the row-group level; consequently, files with a single large row group can only be processed by a single thread. The DuckDB documentation recommends row groups of 100 K–1 M rows and warns that sizes below 5,000 rows can degrade performance by 5–10x. Above roughly 100,000 rows, performance differences become negligible.
By comparison, Polars’ default row group size is approximately 250 K rows. Row-group configuration may influence both engines’ parallelism and memory behavior, particularly for large files. We therefore created the Parquet files explicitly with a row group size of 300,000.
Initial tests were conducted with roughly half of Polars’ ideal row group size (the default setting for Parquet files generated by DuckDB). We observed that DuckDB’s memory usage decreased substantially with the larger row groups—for example, from 1.3 GB to 500 MB on the 140 GB dataset—while Polars’ memory usage remained relatively stable.
In contrast, both engines achieved significant execution-time improvements. The 2 TB dataset completed in 60 s with Polars (previously 170 s) and in 44 s with DuckDB (previously 70 s). Although Parquet is an open, engine-agnostic format, these results show that physical layout choices—such as row group size—can implicitly favor certain execution engines. In practice, storage and compute are therefore not entirely independent, even when using open formats.
Conclusion
Both DuckDB and Polars demonstrate strong performance on large-scale analytical workloads, but their behavior differs markedly in terms of memory usage, scalability, and sensitivity to Parquet file layout. For datasets in the hundreds of gigabytes to terabytes, DuckDB consistently exhibits lower incremental peak memory, more stable scaling characteristics, and robust throughput—even when operating on very large, monolithic Parquet files. Its vectorized, memory-bounded execution model enables efficient parallel processing while maintaining a tightly controlled working set.
Polars, in contrast, achieves competitive execution times but shows substantially higher transient memory usage when processing large files. Its performance and memory efficiency improve significantly when data is partitioned into many smaller Parquet files, suggesting that its streaming and buffering mechanisms benefit from finer-grained file layouts. Polars therefore remains a strong choice for DataFrame-oriented workflows and scenarios where file partitioning is feasible and sufficient memory headroom is available.
Overall, these results highlight that analytical performance is shaped primarily by execution strategy, file layout, and workload characteristics rather than by raw engine capability alone. There is no universally optimal system: DuckDB is particularly well suited for memory-constrained, large-scale analytical scans over massive Parquet files, while Polars excels in flexible, DataFrame-centric environments with well-partitioned data.
While neither engine dominates across all scenarios in terms of execution time, both demonstrate exceptional performance. Few existing analytical systems match the speed of DuckDB and Polars on analytical workloads, as illustrated by the benchmarks published by Polars.
Finally, although Parquet is an open, engine-agnostic format, our results underline that physical layout choices—such as row group size—implicitly favor certain execution engines. In practice, storage and compute are therefore more tightly coupled than often assumed, even when using open data formats.
If you are interested in leveraging DuckDB’s demonstrated capabilities in the cloud, enroll in our on-demand Hands-on Workshop: Introduction to MotherDuck for a complete practical walkthrough!
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Niklas Niggemann
Working Student Data & AI
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.