Update 10.12.25 – After helpful insights from Polars Engineer Thijs Nieuwdorp following the initial posting of this article, we were able to refactor our use of the deprecated .count() function in Polars, replacing it with the correct .len() function.
Efficient processing of large, structured datasets is central to modern data analysis. Pandas has long been Python’s default DataFrame library, valued for its flexibility, rich ecosystem, and intuitive API. As datasets grow beyond memory and performance demands rise, newer tools like Polars and DuckDB have gained traction. While Polars and Pandas are DataFrame libraries and DuckDB is an embedded SQL analytics engine, all three aim to make large-scale data work faster and simpler—through parallel execution, lazy computation, and out-of-core processing.
This article compares Pandas, Polars, and DuckDB across performance, memory usage, scalability, ergonomics, and interoperability. We’ll highlight when a DataFrame-first workflow shines, when SQL-first tooling is better, and how these tools complement each other in real-world pipelines.
Background
Pandas
Pandas remains the most widely adopted DataFrame library, offering a mature API and seamless integration with the Python scientific stack. It shines in prototyping, data cleaning, and exploratory analysis. By default, pandas executes operations eagerly and largely single-threadedly, and typical workflows assume data fits in memory—so very large files (e.g., tens of GB) can lead to memory pressure or crashes unless you chunk or downsample. Recent releases (pandas 2.x) add copy-on-write and optional Arrow-backed dtypes that improve memory efficiency and interoperability, but pandas is not an out-of-core or parallel analytics engine.
Polars
Polars is a modern, columnar DataFrame library written in Rust with Python bindings, leveraging Apache Arrow for efficient memory representation. It supports multithreaded execution by default and offers both eager and lazy APIs. The lazy engine enables query optimization (projection and predicate pushdown) and, when used with streaming execution, can process datasets larger than RAM for many workloads. Not all operations stream, but for medium to large datasets the combination of parallelism, lazy optimization, and streaming often yields substantial speed and memory benefits. While its ecosystem is still growing and some pandas-specific features aren’t mirrored, Polars’ rapid development and optional GPU acceleration make it a compelling choice for high-performance data processing.
DuckDB
DuckDB is a modern, in-process OLAP SQL engine written in C++ and designed for high-performance analytical queries. It uses a vectorized, pipelined execution engine with a cost-based optimizer, supports multithreaded execution, and can process datasets larger than RAM via streaming scans and automatic spilling to disk for operations like sorts and aggregates. DuckDB can execute SQL directly over CSV files as well as in-memory DataFrames (pandas, Polars) and Arrow tables, enabling a seamless SQL-first workflow. It excels at complex joins, aggregations, and group-bys. DuckDB’s performance, scalability, and interoperability make it a powerful building block for analytics pipelines.
Methodology
Benchmarking Principles
Benchmarking is more than running the same operation on different systems. To ensure fair and reproducible results, we follow best practices for benchmarking, making all scripts and environments available for review and adhering to guidelines established by experts in the field, for example, DuckDB’s very own Hannes Mühleisen.
We benchmark each tool using its idiomatic, built-in workflow for large, structured data—without special tuning or external extensions. DuckDB is used as a SQL-first engine, querying files directly. Polars is used via its lazy scan API with streaming enabled, which is the documented approach for processing large files efficiently. Pandas is used with its standard eager DataFrame construction (read_csv) because it has no integrated out-of-core or lazy engine. This approach reflects how practitioners naturally solve the task in each tool while avoiding configuration knobs (thread counts, PRAGMAs, alternative parsers, GPU backends).
Test Setup
We benchmarked core OLAP operations—filtering and counting—using a real-world ecommerce dataset (CSV, 9 GB, 67 million rows, 9 columns). All tests were performed on a 2021 MacBook Pro (M1 Max, 32 GB RAM) using Python. Our benchmarking tool ensures consistent and reproducible command-line execution, allowing users to specify the tool, operation, benchmark mode (cold or hot), and number of runs. Results are visualized with matplotlib, and we report key statistical metrics: mean, standard deviation, and coefficient of variation.
Memory usage is measured by recording the memory consumed immediately before and after each function call; the difference represents the memory used by the function. Hot runs leverage OS page cache and library buffers. Cold runs execute in isolated processes with randomized file access; macOS page cache is not force-flushed, so results represent “colder” rather than fully uncached scenarios. For both modes, each operation is repeated 10 times.
We publish the exact scripts and environment details so readers can rerun the idiomatic paths as described in our repository. No special flags or extensions are required beyond installing the standard packages; results should be stable across similar hardware and OS configurations.
Results
Cold Runs
Cold benchmark results highlight DuckDB’s memory efficiency. Polars nearly matches DuckDB's execution time and memory usage, while Pandas lags significantly behind both tools in both metrics.
Pandas shows high overhead due to parsing the CSV and materializing a DataFrame. In cold runs, this consumes roughly 10 GB (matching the ~10 GB CSV), and filtering for purchase events adds about 1 GB more.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.2 MiB 142.2 MiB 1 @profile 4 8 def filtering_counting(): 5 9 13112.7 MiB 12970.5 MiB 1 df = pd.read_csv(dataset_path) 6 10 14491.0 MiB 1378.3 MiB 1 purchases = df[df["event_type"] == "purchase"] 7 11 14491.1 MiB 0.1 MiB 1 print("Count:", len(purchases))
In eager mode, Polars typically materializes a DataFrame from the entire dataset, which can lead to substantial memory usage. By contrast, when using a lazy CSV scan, Polars avoids loading the full dataset into memory and instead processes only the rows required for the specific operation. This approach enables efficient batch processing rather than loading everything at once, resulting in clear memory savings. Additionally, Polars’ lazy engine supports streaming execution, which further reduces memory consumption by processing data in smaller, manageable chunks.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.8 MiB 142.8 MiB 1 @profile 4 8 def filtering_counting(): 5 9 143.2 MiB 0.4 MiB 1 lf = pl.scan_csv(dataset_path) 6 10 1681.7 MiB 1538.5 MiB 1 result = lf.filter(pl.col("event_type") == "purchase").count().collect(streaming=True) 7 11 1681.9 MiB 0.2 MiB 1 print(result)
Initially, we thought this would be the maximum for Polars. But after we published this article on LinkedIn, Thijs Nieuwdorp, Developer Relations Engineer at Polars, reached out and pointed out an oversight:
I dove into it a little deeper and noticed that the DuckDB COUNT(*) doesn't translate well to our .count(). The latter counts the number of non-null elements in every column, causing us to scan the entire file. Instead, you could replace the .count() with .select(pl.len()) and get the same result as DuckDB, which should be a single column with one value, the length of the DataFrame.
We were happy to implement this suggestion, which significantly reduced Polars' memory usage by about 1 GB, as Thijs correctly observed:
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.1 MiB 142.1 MiB 1 @profile 4 8 def filtering_counting(): 5 9 142.5 MiB 0.4 MiB 1 lf = pl.scan_csv(dataset_path) 6 10 592.7 MiB 450.3 MiB 1 result = lf.filter(pl.col("event_type") == "purchase").select(pl.len()).collect(streaming=True) 7 11 593.0 MiB 0.2 MiB 1 print(result)
DuckDB queries the CSV directly, using only ~300 MB for the entire process and returning results faster because it uses late materialization and vector-at-a-time processing with predicate pushdown and vectorized pipelines.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.2 MiB 142.2 MiB 1 @profile 4 8 def filtering_counting(): 5 9 428.3 MiB 286.1 MiB 1 duckdb.sql(f"SELECT COUNT(*) AS purchase_count FROM read_csv_auto('{dataset_path}') WHERE event_type = 'purchase'").show()
Hot Runs
In hot benchmarks, DuckDB’s advantage shrinks. Pandas reduces memory usage after initial runs and frequently alternates between freeing and consuming memory—yet still remains higher than the competition. Polars' memory usage also varies, but it benefits from a lower base level and frequent memory release.
Polars is thus able to close the gap to DuckDB.
None of the tools achieve significant time savings during hot runs. Over 10 hot runs, Pandas demonstrates a notable memory reduction of 10 GB compared to its cold runs; however, it still consumes substantially more memory than both Polars and DuckDB in their cold runs. In hot runs, Polars achieves even greater memory savings than DuckDB relative to their respective cold runs, allowing Polars to match DuckDB’s efficiency during hot runs.
Discussion
These benchmarks underscore how architecture and execution models drive the behavior of Pandas, Polars, and DuckDB on large datasets—results depend on data format, schema (string vs numeric), filter selectivity, and storage characteristics; our findings reflect a ~10 GB CSV on a local SSD.
Pandas is consistently slower and less predictable in memory because it eagerly materializes a full DataFrame from CSV and executes largely single-threaded. While many operations are array-level vectorized via NumPy, pandas lacks a query optimizer and a database-style vectorized/pipelined engine, and it does not offer integrated lazy or out-of-core execution. For big files, this translates into higher peak memory, more Python-object overhead (especially for strings), and limited parallelism. Advanced patterns (e.g., read_csv with chunksize/iterators) can reduce memory but change the programming model and fall outside our idiomatic comparison.
Polars, built in Rust on Arrow, leverages multithreading and a columnar expression engine to outperform pandas on speed. Its lazy API with streaming mode can avoid full materialization for many queries by pushing filters/projections into scans. Streaming is not universal: operations that require cross-batch context or materialize intermediates may still consume several gigabytes of RAM. In our filter-and-count workload and environment, we observed peak memory around ~0.5 GB with lazy+streaming; actual figures vary with schema and selectivity. Overall, Polars substantially improves memory efficiency and runtime versus pandas and even matches DuckDB during hot runs for large, on-disk analytics.
DuckDB delivers consistent, low memory and strong performance by combining a cost-based optimizer with vectorized, pipelined execution and late materialization. It pushes predicates and projections into file scans, streams data without fully materializing tables, keeping peak RSS small and stable. Running in-process with C++ data structures avoids Python-object overhead, and multicore utilization is automatic, making DuckDB well-suited for fast ad-hoc analytics over large CSV files.
Why is Pandas so slow?
DuckDB and Polars process only the necessary columns, execute in cache-friendly vectors, and leverage parallel pipelines; pandas eagerly builds full DataFrames and lacks a query optimizer or a database-style execution engine. Despite array-level vectorization via NumPy, pandas’ typical usage incurs more computation, more memory traffic, and limited parallelism at scale. Without integrated lazy/out-of-core execution, pandas’ idiomatic path remains memory-bound. While chunked reading can help, it changes the workflow and isn’t directly comparable to SQL/lazy pipelines.
Conclusion
Pandas, Polars, and DuckDB each excel in different parts of modern analytics. For small to medium datasets and rich library integration, pandas remains a productive default. For larger or performance-sensitive workloads, Polars and DuckDB deliver substantial gains via parallel execution and out-of-core/lazy pipelines. In our CSV-based benchmarks, DuckDB provided the most consistent performance and the lowest peak memory by querying files directly without fully materializing tables. Thanks to the suggestions from the Polars team, Polars was able to reduce its memory usage by about 1 GB and now positions itself competitively just behind DuckDB by roughly 100 MB. This highlights how small implementation details can have a significant impact on real-world performance and memory efficiency.
This exchange of knowledge between tool developers and practitioners is vital for the progress of the data ecosystem. Open dialogue—such as the feedback we received from Thijs Nieuwdorp at Polars—not only helps correct misunderstandings and improve benchmarking accuracy, but also accelerates the adoption of best practices across the community. By sharing insights and collaborating openly, we ensure that both tools and users evolve together, leading to more robust, efficient, and user-friendly solutions. Such interactions highlight the importance of transparency, humility, and continuous learning in the rapidly changing landscape of data analytics.
DuckDB is strongest for SQL-first, on-disk analytics; Polars shines for fast DataFrame transformations (especially with lazy + streaming); pandas remains ideal for interactive munging on moderate-sized, in-memory data. The most effective strategy is blended: use each tool where its architecture aligns with the problem. Results may vary with data format (CSV vs Parquet), schema (string vs numeric), filter selectivity, and hardware.
If you are interested in leveraging DuckDB's demonstrated power into the cloud, enroll in our on-demand Hands-on Workshop: Introduction to MotherDuck for a complete practical walkthrough!
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Niklas Niggemann
Working Student Data & AI
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.