Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic
In our previous benchmarks, DuckDB consistently outperformed Polars and Pandas on large analytical workloads, but performance comparisons miss a critical question: what happens when you need to move from local DuckDB development to a BigQuery production environment, or migrate an entire data platform from Spark to Snowflake? Rewriting transformation logic because infrastructure changed is expensive and avoidable.
Modern analytical engines are highly capable, but once multiple tools prove "fast enough" for real-world use, the engineering challenge shifts from raw speed to flexibility. Data teams prototype locally, deploy to cloud warehouses, and later migrate platforms for cost, scalability, or operational reasons. When analytical logic is tightly coupled to a specific engine, these transitions consume hundreds of engineering hours, even when the underlying computation remains conceptually identical.
This article explores how Ibis decouples analytical intent from execution engines, enabling teams to write transformation logic once and run it across different backends without costly rewrites. Rather than competing on performance alone, Ibis competes on portability, maintainability, and architectural adaptability.
The Portability Problem: When Databases Don't Speak the Same Language
Data engineering operates across two dominant paradigms, Python DataFrames and SQL, and both face the same structural problem: every database system exposes its own APIs and SQL dialects. This fragmentation creates concrete portability barriers.
These differences accumulate rapidly. A team migrating multiple SQL pipelines might spend weeks rewriting dialect-specific queries, testing edge cases, and debugging subtle semantic differences. Ibis addresses this by providing a consistent interface for expressing analytical intent while translating to backend-specific implementations automatically.
Historical Context
Pandas was designed in 2008 for interactive, in-memory analysis on datasets that fit in RAM. As analytical workloads increasingly involve multi-terabyte datasets and distributed execution, the tight coupling between Pandas' API and eager execution became a constraint.
Ibis was designed to address this by separating analytical intent from execution strategy, an approach influenced by dplyr and the R ecosystem, where the DataFrame abstraction originated.
Why Decoupling Logic from Execution Matters
Ibis is built on a core architectural principle: Analytical intent should be defined independently of the execution engine. This separation delivers concrete engineering and business value, as logic tightly coupled to production infrastructure is difficult to test locally. Engineers wait for CI/CD pipelines or develop against expensive cloud sandboxes. Portable queries run identically on local DuckDB and production BigQuery, enabling fast iteration with confidence that local results match production behavior. Additionally, as systems span local machines, cloud warehouses, and distributed compute platforms, portable query logic allows teams to optimize infrastructure choices, cost, performance, compliance requirements, without disrupting analytical workflows.
Ibis: A Backend-Agnostic DataFrame API
Ibis provides a DataFrame-style API that is designed to be independent of any specific execution engine. While the syntax resembles Pandas, operations are not executed eagerly. Instead, Ibis builds a symbolic representation of the query.
In many local setups, Ibis uses DuckDB as an embedded backend by default, making it convenient for experimentation and prototyping. The same query logic can later be executed on distributed systems or cloud data warehouses with minimal changes.
1import ibis 2 3con = ibis.connect("duckdb://") 4t = con.read_csv("data.csv") 5 6expr = ( 7 t.filter(t.value > 100) 8 .group_by(t.category) 9 .aggregate(total=t.value.sum()) 10) 11 12expr.execute()
Ibis does not execute operations eagerly. No computation occurs until execute() is called. Until then, Ibis builds a symbolic representation of the intended transformation.
Execution Model: What Ibis Does and Doesn't Do
Ibis does not execute queries itself. It functions as a query compiler that translates DataFrame operations into backend-specific SQL and delegates execution entirely to the target engine. Query planning, optimization, and execution are handled by the backend (DuckDB, BigQuery, Spark, etc.).
Ibis performance is effectively DuckDB performance (or BigQuery performance, or Snowflake performance). The translation layer adds minimal overhead. Once the SQL is generated, execution speed depends entirely on the backend's capabilities.
Not all backends support all operations. Ibis maintains a compatibility matrix showing which operations work on which engines. In practice, common analytical operations (filters, aggregations, joins, window functions) are well-supported across major backends.
Portability as a Core Feature
Switching execution engines requires only a change in the connection configuration:
1ibis.connect("duckdb://") 2ibis.connect("polars://") 3ibis.connect("pyspark://") 4ibis.connect("bigquery://")
Backend support varies in maturity, but the abstraction remains consistent. A workflow can be developed locally with DuckDB and later executed on BigQuery or Spark without rewriting the transformation logic.
SQL Generation and Dialect Translation
Most Ibis backends generate SQL and rely on SQLGlot for cross-dialect translation. SQLGlot translates queries to conform to the target engine's syntax, while optimization remains the database's responsibility.
For transparency and debugging, Ibis allows users to inspect generated SQL:
1print(ibis.to_sql(expr))
DataFrame expressions can be combined with raw SQL where needed, providing escape hatches when the abstraction doesn't fit specific use cases.
Apache Arrow and Interoperability
Supporting multiple backends would be far more difficult without Apache Arrow, a standardized in-memory columnar format. Arrow enables efficient data exchange between engines and client libraries.
Arrow enables zero-copy conversion when engines share compatible memory layouts, for example, DuckDB ↔ Polars or PyArrow ↔ DuckDB. Data can be passed between systems without serialization or copying, dramatically reducing overhead.
Pandas conversions often still require copies due to its NumPy-backed memory model, which predates Arrow's design. The pandas 2.x series introduced optional Arrow-backed dtypes (pd.ArrowDtype) that enable zero-copy interoperability, but adoption requires explicit opt-in.
Although largely invisible to end users, Arrow is a foundational component enabling Ibis to move data between backends efficiently.
Developer Experience
Ibis integrates naturally with standard Python workflows:
- Tables convert to Pandas, Polars, or PyArrow with
.to_pandas(),.to_polars(), or.to_pyarrow() - Queries are lazy by default;
.execute()triggers computation - DataFrame expressions combine freely with raw SQL via
.sql()method - Python testing frameworks (pytest, unittest) validate analytical logic locally
Maturity and Ecosystem
Ibis is a mature, production-ready project backed by Voltron Data (the company behind Apache Arrow). Originally created by Wes McKinney (creator of Pandas) in 2015, Ibis has been in active development for nearly a decade.
- Corporate backing: Voltron Data provides full-time engineering resources
- Community: Active development community, responsive maintainers, regular releases
- Production usage: Used by companies including Bloomberg, RStudio/Posit, and various data teams
- Backend support: 20+ backends with varying maturity levels (DuckDB, BigQuery, Snowflake, Postgres, Spark are well-supported)
Ibis offers production-grade stability and ecosystem maturity, making it a reliable choice for analytical portability needs.
Conclusion
Our benchmarks showed DuckDB consistently outperforming Polars and Pandas for large analytical workloads. But performance alone doesn't determine architectural success.
Data teams prototype locally, deploy to cloud warehouses, and migrate platforms for cost, scalability, and operational reasons. When analytical logic is tightly coupled to execution engines, these transitions consume hundreds of engineering hours, even when the underlying computation remains identical.
Ibis addresses this by decoupling analytical intent from execution. It provides a mature, backend-agnostic DataFrame API backed by Voltron Data, enabling teams to write transformation logic once and execute it across 20+ different backends, from local DuckDB to production-scale BigQuery, Snowflake, or Spark clusters.
Ibis doesn't replace DuckDB, Polars, or Spark, it extends their usefulness by enabling engineers to choose when and where those engines run. The same query logic developed locally on DuckDB can deploy to BigQuery without modification. Analytical applications can support multiple customer environments without maintaining separate codebases for each platform.
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Niklas Niggemann
Working Student Data & AI
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.