Efficient processing of large, structured datasets is central to modern data analysis. Pandas has long been Python’s default DataFrame library, valued for its flexibility, rich ecosystem, and intuitive API. As datasets grow beyond memory and performance demands rise, newer tools like Polars and DuckDB have gained traction. While Polars and Pandas are DataFrame libraries and DuckDB is an embedded SQL analytics engine, all three aim to make large-scale data work faster and simpler—through parallel execution, lazy computation, and out-of-core processing.
This article compares Pandas, Polars, and DuckDB across performance, memory usage, scalability, ergonomics, and interoperability. We’ll highlight when a DataFrame-first workflow shines, when SQL-first tooling is better, and how these tools complement each other in real-world pipelines.
Background
Pandas
Pandas remains the most widely adopted DataFrame library, offering a mature API and seamless integration with the Python scientific stack. It shines in prototyping, data cleaning, and exploratory analysis. By default, pandas executes operations eagerly and largely single-threadedly, and typical workflows assume data fits in memory—so very large files (e.g., tens of GB) can lead to memory pressure or crashes unless you chunk or downsample. Recent releases (pandas 2.x) add copy-on-write and optional Arrow-backed dtypes that improve memory efficiency and interoperability, but pandas is not an out-of-core or parallel analytics engine.
Polars
Polars is a modern, columnar DataFrame library written in Rust with Python bindings, leveraging Apache Arrow for efficient memory representation. It supports multithreaded execution by default and offers both eager and lazy APIs. The lazy engine enables query optimization (projection and predicate pushdown) and, when used with streaming execution, can process datasets larger than RAM for many workloads. Not all operations stream, but for medium to large datasets the combination of parallelism, lazy optimization, and streaming often yields substantial speed and memory benefits. While its ecosystem is still growing and some pandas-specific features aren’t mirrored, Polars’ rapid development and optional GPU acceleration make it a compelling choice for high-performance data processing.
DuckDB
DuckDB is a modern, in-process OLAP SQL engine written in C++ and designed for high-performance analytical queries. It uses a vectorized, pipelined execution engine with a cost-based optimizer, supports multithreaded execution, and can process datasets larger than RAM via streaming scans and automatic spilling to disk for operations like sorts and aggregates. DuckDB can execute SQL directly over CSV files as well as in-memory DataFrames (pandas, Polars) and Arrow tables, enabling a seamless SQL-first workflow. It excels at complex joins, aggregations, and group-bys. DuckDB’s performance, scalability, and interoperability make it a powerful building block for analytics pipelines.
Methodology
Benchmarking principles
Benchmarking is more than running the same operation on different systems. To ensure fair and reproducible results, we follow best practices for benchmarking, making all scripts and environments available for review and adhering to guidelines established by experts in the field, for example, DuckDB’s very own Hannes Mühleisen.
We benchmark each tool using its idiomatic, built‑in workflow for large, structured data—without special tuning or external extensions. DuckDB is used as a SQL‑first engine, querying files directly. Polars is used via its lazy scan API with streaming enabled, which is the documented approach for processing large files efficiently. Pandas is used with its standard eager DataFrame construction (read_csv) because it has no integrated out‑of‑core or lazy engine. This approach reflects how practitioners naturally solve the task in each tool while avoiding configuration knobs (thread counts, PRAGMAs, alternative parsers, GPU backends).
Test setup
We benchmarked three core OLAP operations—filtering and counting, filtering with grouping and aggregation, and grouping with conditional aggregation—using a real-world ecommerce dataset (CSV, 9 GB, 67 million rows, 9 columns). All tests were performed on a 2021 MacBook Pro (M1 Max, 32 GB RAM) using Python. Our benchmarking tool ensures consistent and reproducible command-line execution, allowing users to specify the tool, operation, benchmark mode (cold or hot), and number of runs. Results are visualized with matplotlib, and we report key statistical metrics: mean, standard deviation, and coefficient of variation.
Memory usage is measured by recording the memory consumed immediately before and after each function call; the difference represents the memory used by the function. Hot runs leverage OS page cache and library buffers. Cold runs execute in isolated processes with randomized file access; macOS page cache is not force-flushed, so results represent “colder” rather than fully uncached scenarios. For both modes, each operation is repeated 10 times.
We publish the exact scripts and environment details so readers can rerun the idiomatic paths as described in our repository. No special flags or extensions are required beyond installing the standard packages; results should be stable across similar hardware and OS configurations.
Results
For our article, we focus on the filtering and counting operation as a representative example. Detailed values and plots for all benchmarks are available in the repository.
Cold runs
Cold benchmark results highlight DuckDB’s superior memory efficiency. Polars is able to nearly match DuckDB's execution time and even though it manages to use significantly less memory than Pandas, it still requires roughly 3 times the amount in comparison to DuckDB.
Pandas shows high overhead due to parsing the CSV and materializing a DataFrame. In cold runs, this consumes roughly 10 GB (matching the ~10 GB CSV), and filtering for purchase events adds about 1 GB more.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.7 MiB 142.7 MiB 1 @profile 4 8 def filtering_counting(): 5 9 10600.2 MiB 10457.5 MiB 1 df = pd.read_csv(dataset_path) 6 10 11494.2 MiB 894.0 MiB 1 purchases = df[df["event_type"] == "purchase"] 7 11 11494.4 MiB 0.2 MiB 1 print("Count:", len(purchases))
In eager mode, Polars typically materializes a DataFrame from the entire dataset, which can lead to substantial memory usage. By contrast, when using a lazy CSV scan, Polars avoids loading the full dataset into memory and instead processes only the rows required for the specific operation. This approach enables efficient batch processing rather than loading everything at once, resulting in clear memory savings. Additionally, Polars’ lazy engine supports streaming execution, which further reduces memory consumption by processing data in smaller, manageable chunks.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 142.8 MiB 142.8 MiB 1 @profile 4 8 def filtering_counting(): 5 9 143.2 MiB 0.4 MiB 1 lf = pl.scan_csv(dataset_path) 6 10 1681.7 MiB 1538.5 MiB 1 result = lf.filter(pl.col("event_type") == "purchase").count().collect(streaming=True) 7 11 1681.9 MiB 0.2 MiB 1 print(result)
DuckDB queries the CSV directly, using only ~300 MB for the entire process and returning results faster because it uses late materialization and vector-at-a-time processing with predicate pushdown and vectorized pipelines.
1Line # Mem usage Increment Occurrences Line Contents 2============================================================= 3 7 143.0 MiB 143.0 MiB 1 @profile 4 8 def filtering_counting(): 5 9 461.9 MiB 318.8 MiB 1 duckdb.sql(f"SELECT COUNT(*) AS purchase_count FROM read_csv_auto('{dataset_path}') WHERE event_type = 'purchase'").show()
Hot runs
In hot benchmarks, DuckDB’s advantage grows. Both Pandas and Polars reduce memory usage after initial runs and frequently alternate between freeing and consuming memory—yet remain higher and less stable than DuckDB. Interestingly, Polars' memory usage varies more than Pandas'.
Pandas is thus able to close the gap to Polars.
Comparing hot and cold runs, DuckDB’s memory usage stays consistently low, signaling strong optimization. None of the tools achieve meaningful time savings in hot runs. Pandas is able to perform the most signifcant improvement in memory usage with hot runs.
Over 10 hot runs, both Pandas and Polars continue to optimize their memory usage and (very) slowly approach DuckDB’s cold-run memory profile—though they do not match it.
Discussion
These benchmarks underscore how architecture and execution models drive the behavior of Pandas, Polars, and DuckDB on large datasets—results depend on data format, schema (string vs numeric), filter selectivity, and storage characteristics; our findings reflect a ~10 GB CSV on a local SSD.
Pandas is consistently slower and less predictable in memory because it eagerly materializes a full DataFrame from CSV and executes largely single‑threaded. While many operations are array‑level vectorized via NumPy, pandas lacks a query optimizer and a database‑style vectorized/pipelined engine, and it does not offer integrated lazy or out‑of‑core execution. For big files, this translates into higher peak memory, more Python‑object overhead (especially for strings), and limited parallelism. Advanced patterns (e.g., read_csv with chunksize/iterators) can reduce memory but change the programming model and fall outside our idiomatic comparison.
Polars, built in Rust on Arrow, leverages multithreading and a columnar expression engine to outperform pandas on speed. Its lazy API with streaming mode can avoid full materialization for many queries by pushing filters/projections into scans. Streaming is not universal: operations that require cross‑batch context or materialize intermediates may still consume several gigabytes of RAM. In our filter‑and‑count workload and environment, we observed peak memory around ~5 GB with lazy+streaming; actual figures vary with schema and selectivity. Overall, Polars substantially improves memory efficiency and runtime versus pandas but typically does not match DuckDB for large, on‑disk analytics.
DuckDB delivers consistent, low memory and strong performance by combining a cost‑based optimizer with vectorized, pipelined execution and late materialization. It pushes predicates and projections into file scans, streams data without fully materializing tables, keeping peak RSS small and stable. Running in‑process with C++ data structures avoids Python‑object overhead, and multicore utilization is automatic, making DuckDB well‑suited for fast ad‑hoc analytics over large CSV files.
Why is Pandas so slow?
DuckDB and Polars process only the necessary columns, execute in cache‑friendly vectors, and leverage parallel pipelines; pandas eagerly builds full DataFrames and lacks a query optimizer or a database‑style execution engine. Despite array‑level vectorization via NumPy, pandas’ typical usage incurs more computation, more memory traffic, and limited parallelism at scale. Without integrated lazy/out‑of‑core execution, pandas’ idiomatic path remains memory‑bound. While chunked reading can help, it changes the workflow and isn’t directly comparable to SQL/lazy pipelines.
Why is DuckDB faster than Polars?
DuckDB generally does less work and keeps more of it in tight, optimized pipelines. It aggressively pushes filters/projections, uses vector‑at‑a‑time pipelines with late materialization, and avoids Python overhead. Polars is also lazy, columnar, and multi‑threaded, but many workloads still materialize intermediates or require cross‑batch context, so transient memory is often higher and end‑to‑end time can lag DuckDB for large on‑disk analytics.
Conclusion
Pandas, Polars, and DuckDB each excel in different parts of modern analytics. For small to medium datasets and rich library integration, pandas remains a productive default. For larger or performance‑sensitive workloads, Polars and DuckDB deliver substantial gains via parallel execution and out‑of‑core/lazy pipelines. In our CSV‑based benchmarks, DuckDB provided the most consistent performance and the lowest peak memory by querying files directly without fully materializing tables.
DuckDB is strongest for SQL‑first, on‑disk analytics; Polars shines for fast DataFrame transformations (especially with lazy + streaming); pandas remains ideal for interactive munging on moderate‑sized, in‑memory data. The most effective strategy is blended: use each tool where its architecture aligns with the problem. Results may vary with data format (CSV vs Parquet), schema (string vs numeric), filter selectivity, and hardware.
For questions, issues, or to contribute benchmarks, please open an issue or pull request in the repository!
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Niklas Niggemann
Working Student Data & AI
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.