Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

10.2.2026 | 6 minutes reading time

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic

In our previous benchmarks, DuckDB consistently outperformed Polars and Pandas on large analytical workloads, but performance comparisons miss a critical question: what happens when you need to move from local DuckDB development to a BigQuery production environment, or migrate an entire data platform from Spark to Snowflake? Rewriting transformation logic because infrastructure changed is expensive and avoidable.

Modern analytical engines are highly capable, but once multiple tools prove "fast enough" for real-world use, the engineering challenge shifts from raw speed to flexibility. Data teams prototype locally, deploy to cloud warehouses, and later migrate platforms for cost, scalability, or operational reasons. When analytical logic is tightly coupled to a specific engine, these transitions consume hundreds of engineering hours, even when the underlying computation remains conceptually identical.

This article explores how Ibis decouples analytical intent from execution engines, enabling teams to write transformation logic once and run it across different backends without costly rewrites. Rather than competing on performance alone, Ibis competes on portability, maintainability, and architectural adaptability.

The Portability Problem: When Databases Don't Speak the Same Language

Data engineering operates across two dominant paradigms, Python DataFrames and SQL, and both face the same structural problem: every database system exposes its own APIs and SQL dialects. This fragmentation creates concrete portability barriers.

These differences accumulate rapidly. A team migrating multiple SQL pipelines might spend weeks rewriting dialect-specific queries, testing edge cases, and debugging subtle semantic differences. Ibis addresses this by providing a consistent interface for expressing analytical intent while translating to backend-specific implementations automatically.

Historical Context

Pandas was designed in 2008 for interactive, in-memory analysis on datasets that fit in RAM. As analytical workloads increasingly involve multi-terabyte datasets and distributed execution, the tight coupling between Pandas' API and eager execution became a constraint.

Ibis was designed to address this by separating analytical intent from execution strategy, an approach influenced by dplyr and the R ecosystem, where the DataFrame abstraction originated.

Why Decoupling Logic from Execution Matters

Ibis is built on a core architectural principle: Analytical intent should be defined independently of the execution engine. This separation delivers concrete engineering and business value, as logic tightly coupled to production infrastructure is difficult to test locally. Engineers wait for CI/CD pipelines or develop against expensive cloud sandboxes. Portable queries run identically on local DuckDB and production BigQuery, enabling fast iteration with confidence that local results match production behavior. Additionally, as systems span local machines, cloud warehouses, and distributed compute platforms, portable query logic allows teams to optimize infrastructure choices, cost, performance, compliance requirements, without disrupting analytical workflows.

Ibis: A Backend-Agnostic DataFrame API

Ibis provides a DataFrame-style API that is designed to be independent of any specific execution engine. While the syntax resembles Pandas, operations are not executed eagerly. Instead, Ibis builds a symbolic representation of the query.

In many local setups, Ibis uses DuckDB as an embedded backend by default, making it convenient for experimentation and prototyping. The same query logic can later be executed on distributed systems or cloud data warehouses with minimal changes.

1import ibis
2
3con = ibis.connect("duckdb://")
4t = con.read_csv("data.csv")
5
6expr = (
7    t.filter(t.value > 100)
8     .group_by(t.category)
9     .aggregate(total=t.value.sum())
10)
11
12expr.execute()

Ibis does not execute operations eagerly. No computation occurs until execute() is called. Until then, Ibis builds a symbolic representation of the intended transformation.

Execution Model: What Ibis Does and Doesn't Do

Ibis does not execute queries itself. It functions as a query compiler that translates DataFrame operations into backend-specific SQL and delegates execution entirely to the target engine. Query planning, optimization, and execution are handled by the backend (DuckDB, BigQuery, Spark, etc.).

Ibis performance is effectively DuckDB performance (or BigQuery performance, or Snowflake performance). The translation layer adds minimal overhead. Once the SQL is generated, execution speed depends entirely on the backend's capabilities.

Not all backends support all operations. Ibis maintains a compatibility matrix showing which operations work on which engines. In practice, common analytical operations (filters, aggregations, joins, window functions) are well-supported across major backends.

Portability as a Core Feature

Switching execution engines requires only a change in the connection configuration:

1ibis.connect("duckdb://")
2ibis.connect("polars://")
3ibis.connect("pyspark://")
4ibis.connect("bigquery://")

Backend support varies in maturity, but the abstraction remains consistent. A workflow can be developed locally with DuckDB and later executed on BigQuery or Spark without rewriting the transformation logic.

SQL Generation and Dialect Translation

Most Ibis backends generate SQL and rely on SQLGlot for cross-dialect translation. SQLGlot translates queries to conform to the target engine's syntax, while optimization remains the database's responsibility.

For transparency and debugging, Ibis allows users to inspect generated SQL:

1print(ibis.to_sql(expr))

DataFrame expressions can be combined with raw SQL where needed, providing escape hatches when the abstraction doesn't fit specific use cases.

Apache Arrow and Interoperability

Supporting multiple backends would be far more difficult without Apache Arrow, a standardized in-memory columnar format. Arrow enables efficient data exchange between engines and client libraries.

Arrow enables zero-copy conversion when engines share compatible memory layouts, for example, DuckDB ↔ Polars or PyArrow ↔ DuckDB. Data can be passed between systems without serialization or copying, dramatically reducing overhead.

Pandas conversions often still require copies due to its NumPy-backed memory model, which predates Arrow's design. The pandas 2.x series introduced optional Arrow-backed dtypes (pd.ArrowDtype) that enable zero-copy interoperability, but adoption requires explicit opt-in.

Although largely invisible to end users, Arrow is a foundational component enabling Ibis to move data between backends efficiently.

Developer Experience

Ibis integrates naturally with standard Python workflows:

Tables convert to Pandas, Polars, or PyArrow with .to_pandas(), .to_polars(), or .to_pyarrow()
Queries are lazy by default; .execute() triggers computation
DataFrame expressions combine freely with raw SQL via .sql() method
Python testing frameworks (pytest, unittest) validate analytical logic locally

Maturity and Ecosystem

Ibis is a mature, production-ready project backed by Voltron Data (the company behind Apache Arrow). Originally created by Wes McKinney (creator of Pandas) in 2015, Ibis has been in active development for nearly a decade.

Corporate backing: Voltron Data provides full-time engineering resources
Community: Active development community, responsive maintainers, regular releases
Production usage: Used by companies including Bloomberg, RStudio/Posit, and various data teams
Backend support: 20+ backends with varying maturity levels (DuckDB, BigQuery, Snowflake, Postgres, Spark are well-supported)

Ibis offers production-grade stability and ecosystem maturity, making it a reliable choice for analytical portability needs.

Conclusion

Our benchmarks showed DuckDB consistently outperforming Polars and Pandas for large analytical workloads. But performance alone doesn't determine architectural success.

Data teams prototype locally, deploy to cloud warehouses, and migrate platforms for cost, scalability, and operational reasons. When analytical logic is tightly coupled to execution engines, these transitions consume hundreds of engineering hours, even when the underlying computation remains identical.

Ibis addresses this by decoupling analytical intent from execution. It provides a mature, backend-agnostic DataFrame API backed by Voltron Data, enabling teams to write transformation logic once and execute it across 20+ different backends, from local DuckDB to production-scale BigQuery, Snowflake, or Spark clusters.

Ibis doesn't replace DuckDB, Polars, or Spark, it extends their usefulness by enabling engineers to choose when and where those engines run. The same query logic developed locally on DuckDB can deploy to BigQuery without modification. Analytical applications can support multiple customer environments without maintaining separate codebases for each platform.

Was this post helpful?

Blog author

Niklas Niggemann

Working Student Data & AI

Do you still have questions? Just send me a message.

DuckDB vs. Polars: Performance & Memory on Massive Parquet Data

Update 02.02.26 – After helpful insights from the Polars team on LinkedIn, we enhanced our benchmark setup with a configuration of Polars where async is forced. This is elaborated in the article. Our previous benchmark compared DuckDB, Polars, and Pandas...

MotherDuck
Data Science
Data

20.1.2026 | 15 minutes reading time

Niklas Niggemann

MotherDuck: Access Management and Scalable Analytics Overview

MotherDuck's architecture for storage management and user access is built on several key design principles that shape how data is organized and shared. To understand how MotherDuck manages access control, you need to understand three key concepts: organizations...

Data
MotherDuck

8.12.2025 | 6 minutes reading time

Hendrik Kamp

DuckDB vs. DataFrame Libraries

Update 10.12.25 – After helpful insights from Polars Engineer Thijs Nieuwdorp following the initial posting of this article, we were able to refactor our use of the deprecated .count() function in Polars, replacing it with the correct .len() function...

MotherDuck
Data
Data Science
Python
Database

1.12.2025 | 10 minutes reading time

Niklas Niggemann

ODPS: The Standard for Data Products

The data landscape in an organization often looks like this: teams gather and produce data everyday. Each team develops their own metadata models and documentation, if there is any at all. Governance policies exist in scattered documentation (spreadsheets...

Data

7.11.2025 | 4 minutes reading time

DuckDB and MotherDuck for customer facing analytics

MotherDuck
Data

21.10.2025 | 5 minutes reading time

Matthias Niehoff

DuckDB’s friendly SQL is a game changer for developer experience

I don’t think anyone will be surprised when I say that SQL is not the nicest language to work with. Some might even say that it has terrible ergonomics, especially for larger and more complex queries. Still, there are very good reasons why SQL is the...

Data
MotherDuck

14.10.2025 | 12 minutes reading time

Zero-ETL with MotherDuck: A Technical Deep Dive

MotherDuck, the cloud-native service built on DuckDB, fundamentally transforms how organizations interact with data stored in cloud blob storage. By eliminating the traditional ETL/ELT pipeline, MotherDuck enables direct SQL analytics on Parquet, JSON...

MotherDuck
Data

7.10.2025 | 6 minutes reading time

Hendrik Kamp

Your First Data Analysis with MotherDuck and DuckDB: From CSV to Insights...

In this post, we'll explore how MotherDuck, powered by DuckDB, revolutionizes the way you interact with your data, particularly when dealing with CSV files. You'll learn how to quickly parse and filter even large datasets directly from your local machine...

Data
Database
MotherDuck
Big Data

30.9.2025 | 8 minutes reading time

5 Reasons Why We’re Excited About MotherDuck Launch in AWS Frankfurt

5 Reasons We’re Excited About MotherDuck’s Launch in AWS Frankfurt For some time, a key challenge for European data teams has been balancing innovation with strict regulation. We’ve often seen powerful tools launch first in the US, while our need for...

Data
Big Data
Database
News
MotherDuck

24.9.2025 | 6 minutes reading time

Marcel Mikl

Using Dagster with DuckDB

DuckDB has rapidly emerged as a popular in-process analytics database. Dagster, on the other hand, is a modern data orchestration framework that makes it easy to build and manage data pipelines. Combining Dagster with DuckDB allows data engineers to ...

Data

16.5.2025 | 4 minutes reading time

Hendrik Kamp

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 minutes reading time

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 minutes reading time

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 minutes reading time

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 minutes reading time

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 minutes reading time

Dr. Florian Rademacher

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 minutes reading time

Francesca Diana

Ibis: Selecting the Right Execution Engine Without Rewriting Your Logic