I don’t think anyone will be surprised when I say that SQL is not the nicest language to work with. Some might even say that it has terrible ergonomics, especially for larger and more complex queries. Still, there are very good reasons why SQL is the choice for accessing and processing data. Fortunately, DuckDB’s friendly SQL dialect tackles a lot of those issues while retaining the familiar style of classic SQL.
The main issue with maintaining large and complex SQL queries boils down to the syntactical order being very different from the semantic order. This has to be kept in mind when writing queries, making it unintuitive and harder to understand unfamiliar queries: for example, the logical starting point of a query is usually buried somewhere in the middle, and adding additional subqueries is necessaryPDF to use previous results. But it can also impair tool support, for example autocompletion of column names before the FROM
clause has been written.
To show that this can be improved significantly, I’ll start with a somewhat realistic example in PostgreSQL dialect as a stand-in for classic SQL. Then, I’ll go over some of DuckDB’s friendly SQL features and show how they make the same example easier to read and maintain. But before I go into the details, some notes for additional context:
- Do you prefer a hands-on approach?
- Check out my GitHub repository to run the examples locally.
- Or use the web IDE from MotherDuck (free account required) and access my example data with this share:
ATTACH 'md:_share/duckdb-friendly-sql-example-data/0c374195-c236-4b6f-a8ce-02595d90cba6'
- DuckDB is not the only dialect that improves on classic SQL, it just happens to be one I like a lot. So the shown features might exist elsewhere, too.
- The shown features are just a personal selection, ranging from subtle to fancy, and are by no means an exhaustive list. You might not like all of them (or my formatting), but since DuckDB is largely compatible with PostgreSQL, you can cherry-pick to your liking.
A somewhat realistic example: Rubber Duck KPIs
Minimal examples are great for showing how a feature works, but aren’t enough to showcase their impact on things like developer experience. So, I tried to create a somewhat realistic example: big enough to be meaningful, but not so large that it becomes unwieldy. This way, I can show how each improvement makes a difference.
My query of choice selects a specific product category, adds some extra properties and aggregates a few KPIs grouped by product and customer country.
1WITH
2 enriched AS (
3 SELECT
4 p.product_id AS duck_id,
5 p.product_name AS duck_name,
6 COALESCE(p.color, 'Unknown') AS duck_color,
7 COALESCE(p.material, 'Rubber') AS duck_material,
8 c.country AS customer_country,
9
10 oi.quantity,
11 oi.unit_price,
12 COALESCE(oi.discount_pct, 0) AS discount_pct,
13 p.base_cost,
14
15 (
16 COALESCE(oi.discount_pct, 0) >= 0.20
17 OR COALESCE(p.color, '') IN ('Gold', 'Platinum')
18 OR COALESCE(p.material, '') ILIKE '%silicone%'
19 ) AS is_premium_duck
20 FROM orders AS o
21 JOIN order_items AS oi USING (order_id)
22 JOIN products AS p USING (product_id)
23 JOIN customers AS c USING (customer_id)
24 WHERE p.category = 'Rubber Duck'
25 )
26
27SELECT
28 duck_id,
29 format('%s (%s, %s)', duck_name, duck_color, duck_material) AS duck_description,
30 customer_country,
31 SUM(quantity * unit_price * (1 - discount_pct)) AS net_revenue,
32 SUM(quantity * base_cost) AS cogs,
33 SUM((quantity * unit_price * (1 - discount_pct)) - (quantity * base_cost)) AS gross_margin
34FROM enriched
35WHERE is_premium_duck
36GROUP BY duck_id, duck_description, customer_country
37ORDER BY duck_id, customer_country;
Some points that make this kind of query harder to maintain and reduce developer experience:
- Lots of repeated, similar expressions. This makes it cumbersome to change and easy to introduce inconsistencies.
- Not easy to read from top to bottom. For me, the logical starting point would be the
FROM
clause of theenriched
CTE. - I would prefer if the calculation of the individual KPIs were not part of the aggregation in the final projection. This can be easily done by introducing another CTE or some subqueries, but this would add more repetitions (more verbose example).
- The last one is a personal preference, but I do not like the default alias syntax. The column name can provide helpful context for understanding an expression.
expression AS alias
changes the text directionality to a half-baked right-to-left.
In the following sections, I’ll introduce the friendly SQL features I chose with minimal examples and what the somewhat realistic example looks like with them applied.
Friendlier Syntax
The first features for a friendlier SQL are purely syntactical. Some are subtle, but others can take some getting used to. Especially if you have been working with SQL for a long time, the reading patterns you’ve formed might collide with the changed structure.
FROM-first syntax
This one simply lets you put the FROM
and JOIN
clauses before the SELECT
statement.
1FROM orders o 2JOIN customers c USING (customer_id) 3SELECT o.order_id, o.order_date, c.country;
This makes it a lot easier to read queries from top to bottom. For more details, check out the official documentation.
Trailing commas
Many modern programming languages allow trailing commas in different places, and for good reason. Being able to reorder projections or comment out lines without thinking about it makes working on queries a lot smoother.
1FROM products 2SELECT 3 product_id, 4 product_name, 5 base_cost, 6;
It's something you don't really notice at first, but it is sorely missed once it's gone. Unfortunately, trailing commas aren’t supported everywhere. Most notably in CTEs, but ORDER BY
doesn’t allow them either.
Prefix aliases
Putting the column alias before its expression is not a subtle change and is absolutely a matter of personal preference.
1FROM products
2SELECT
3 duck_id: product_id,
4 duck_name: product_name,
5 duck_color: color,
6WHERE
7 category = 'Rubber Duck';
I find it makes queries easier to read, but I can see how it can complicate formatting and might feel unnatural when you’re used to expression AS alias
. Ultimately, it’s more important to pick a style and stick with it than which one is chosen. Like trailing commas, prefix aliases aren’t supported everywhere. For example, you can’t use them in the star and COLUMNS
expressions (which I’ll cover later).
A friendlier somewhat realistic example
Applying all three of these features to the example query is a solid first step toward a better developer experience.
1WITH
2 enriched AS (
3 FROM o: orders
4 JOIN oi: order_items USING (order_id)
5 JOIN p: products USING (product_id)
6 JOIN c: customers USING (customer_id)
7 SELECT
8 duck_id: p.product_id,
9 duck_name: p.product_name,
10 duck_color: COALESCE(p.color, 'Unknown'),
11 duck_material: COALESCE(p.material, 'Rubber'),
12 customer_country: c.country,
13
14 oi.quantity,
15 oi.unit_price,
16 discount_pct: COALESCE(oi.discount_pct, 0),
17 p.base_cost,
18
19 is_premium_duck: (
20 COALESCE(oi.discount_pct, 0) >= 0.20
21 OR COALESCE(p.color, '') IN ('Gold', 'Platinum')
22 OR COALESCE(p.material, '') ILIKE '%silicone%'
23 ),
24 WHERE p.category = 'Rubber Duck'
25 )
26
27FROM enriched
28SELECT
29 duck_id,
30 duck_description: format('{0} ({1}, {2})', duck_name, duck_color, duck_material),
31 customer_country,
32 net_revenue: SUM(quantity * unit_price * (1 - discount_pct)),
33 cogs: SUM(quantity * base_cost),
34 gross_margin: SUM((quantity * unit_price * (1 - discount_pct)) - (quantity * base_cost)),
35WHERE is_premium_duck
36GROUP BY duck_id, duck_description, customer_country
37ORDER BY duck_id, customer_country;
Reusable column aliases
In classic SQL, you can’t reuse aliased expressions within the same scope. This leads to lots of repeated code or extra subqueries and CTEs. Complex queries can quickly become hard to maintain and balancing its length and legibility is not an easy feat.
The way DuckDB allows using aliases directly is a major reason why I consider it a game changer. It enables us to think in a more direct fashion when constructing queries. Chaining calculations together in small increments feels very smooth and is easier to follow.
1FROM order_items
2SELECT
3 discount_price: unit_price * (1 - discount_pct),
4 total_price: discount_price * quantity,
5 taxed_price: total_price * (1 + tax_rate),
6WHERE
7 taxed_price - total_price > 5;
But minimal examples like this undersell the difference this makes and even the somewhat realistic example is - in my opinion - too small to really show this. So I suggest to everyone to take their SQL monster of the week and see what changes when aliases can be reused.
1WITH
2 enriched AS (
3 FROM o: orders
4 JOIN oi: order_items USING (order_id)
5 JOIN p: products USING (product_id)
6 JOIN c: customers USING (customer_id)
7 SELECT
8 duck_id: p.product_id,
9 duck_name: p.product_name,
10 duck_color: COALESCE(p.color, 'Unknown'),
11 duck_material: COALESCE(p.material, 'Rubber'),
12 customer_country: c.country,
13
14 duck_discount: COALESCE(oi.discount_pct, 0),
15 net_revenue_line: oi.quantity * oi.unit_price * (1 - duck_discount),
16 cogs_line: oi.quantity * p.base_cost,
17 gross_margin_line: net_revenue_line - cogs_line,
18
19 is_premium_duck: duck_discount >= 0.20 OR duck_color IN ('Gold', 'Platinum') OR duck_material ILIKE '%silicone%',
20 WHERE p.category = 'Rubber Duck'
21 )
22
23FROM enriched
24SELECT
25 duck_id,
26 duck_description: format('{0} ({1}, {2})', duck_name, duck_color, duck_material),
27 customer_country,
28 net_revenue: SUM(net_revenue_line),
29 cogs: SUM(cogs_line),
30 gross_margin: SUM(gross_margin_line),
31WHERE is_premium_duck
32GROUP BY duck_id, duck_description, customer_country
33ORDER BY duck_id, customer_country;
A few differences I want to highlight:
- Reusing aliases in the same
SELECT
statement makes theenriched
CTE much more concise.- Columns like
duck_color
andduck_material
can be reused, making it feasible to put theis_premium_duck
expression in a single line. - The KPIs can be calculated step by step, without needing an intermediate subquery.
- Columns like
- Moving the KPIs to the
enriched
CTEs improves separation of concerns.enriched
becomes a collection of self-contained, pre-processed data points.- The final projection builds the target structure and calculates the desired aggregations.
Star & COLUMNS expression
The previous two categories were more on the low-key end of things, so I wanted my last pick to be a bit fancy. DuckDB’s star and COLUMNS
expressions are powerful additions. But, as with any powerful tool, overusing them can make things worse rather than better. And, as mentioned earlier, prefix aliases do not work here, so the classic alias syntax has to be used.
Extended star expression capabilities
A shorthand to select every column of a table is useful, but what if you need every column except one or two? In classic SQL you would have to list all the columns you want. DuckDB makes this easier by letting you use EXCLUDE
, REPLACE
, and RENAME
clauses with the star expression.
1-- EXCLUDE
2FROM products
3SELECT * EXCLUDE(category)
4WHERE category = 'Rubber Duck';
5
6-- REPLACE
7FROM order_items
8SELECT * REPLACE(
9 lpad(order_item_id::VARCHAR, 4, '0') AS order_item_id,
10 lpad(product_id::VARCHAR, 4, '0') AS product_id,
11);
12
13-- RENAME
14FROM customers
15SELECT * RENAME(customer_name AS full_name);
16
17-- Combination of all
18FROM customers
19SELECT
20 * EXCLUDE(country)
21 REPLACE('000' || customer_id AS customer_id)
22 RENAME(customer_name AS full_name),
23;
This isn’t everything that has been added to the star expression. You can find all features in the documentation.
COLUMNS expression
The extended star expression is already powerful, but the COLUMNS
expression takes things even further. For example, you can apply the same expression on all matching columns, or even rename columns via regular expressions. The latter is something I would absolutely mark as „danger territory“.
1FROM products
2SELECT lower(
3 COLUMNS(* EXCLUDE (product_id, base_cost))
4);
5
6FROM order_items
7SELECT COLUMNS('(.+)_id') AS '\1';
There’s a lot more you can do with this. Check out the documentation for all the details.
A fancy somewhat realistic example
Applying those features to the somewhat realistic example isn’t straightforward, so I had to introduce an additional CTE. This makes it somewhat more artificial, but it should be easy enough to imagine reasons why a query evolves this way over time.
1WITH
2 enriched AS (
3 FROM o: orders
4 JOIN oi: order_items USING (order_id)
5 JOIN p: products USING (product_id)
6 JOIN c: customers USING (customer_id)
7 SELECT
8 duck_id: p.product_id,
9 duck_name: p.product_name,
10 duck_color: COALESCE(p.color, 'Unknown'),
11 duck_material: COALESCE(p.material, 'Rubber'),
12 customer_country: c.country,
13
14 duck_discount: COALESCE(oi.discount_pct, 0),
15 net_revenue_line: oi.quantity * oi.unit_price * (1 - duck_discount),
16 -- tax_collected_line: net_revenue_line * COALESCE(oi.tax_rate, 0),
17 cogs_line: oi.quantity * p.base_cost,
18 gross_margin_line: net_revenue_line - cogs_line,
19
20 is_premium_duck: duck_discount >= 0.20 OR duck_color IN ('Gold', 'Platinum') OR duck_material ILIKE '%silicone%',
21 WHERE p.category = 'Rubber Duck'
22 ),
23 premium_with_description AS (
24 FROM enriched
25 SELECT
26 * EXCLUDE (duck_name, duck_color, duck_material),
27 duck_description: format('{0} ({1}, {2})', duck_name, duck_color, duck_material),
28 WHERE is_premium_duck
29 )
30
31FROM premium_with_description
32SELECT
33 duck_id,
34 duck_description,
35 customer_country,
36 SUM(COLUMNS('(.*)_line')) AS '\1',
37GROUP BY duck_id, duck_description, customer_country
38ORDER BY duck_id, customer_country;
Being able to pick’n’mix columns with less verbosity is nice to have, but adding a new KPI by simply adding a column named <KPI name>_line
to enriched
is very fancy. It’s also a bit scary, because it significantly increases the potential for side effects. In comparison, a similar query with classic SQL is about 15 lines (35%) longer and requires touching every query to add a new KPI.
Wrapping up
I hope this has given you a sense of why I think DuckDB’s friendly SQL is a game changer for developers and data engineers once the code grows beyond a few lines. If you’re considering DuckDB for your next project, keep in mind that it is explicitly scoped as a simple and fast DBMS. Something else is needed for orchestration, distributed workloads, scalability and other big-project needs. One option is MotherDuck, which is the only tool I know of that has DuckDB at its core. So it might be the right one for you.
Also, this has been just a glimpse of the features DuckDB offers, ranging from nice things making SQL friendlier to having completely new possibilities. Here is an arbitrary list of things, with links to their documentation, that didn’t fit in the scope:
QUALIFY
clause for filtering immediately on window function results.GROUP BY ALL
andORDER BY ALL
so you can omit the columns entirely.- First class support for complex data structures like lists, structs & maps including list comprehension and slicing.
CREATE OR REPLACE
as a shorthand for overwriting objects.- Lambda functions to turn SQL on its head.
- Additional join types like
ASOF
andPOSITIONAL
. - Recursive CTEs with
USING KEY
allowing access to results beyond the previous iteration. - Dot operator for function chaining so statements don’t have to end in five closing brackets.
- And everything else in the friendly SQL docs.
Is there anything on the list above you’d like to hear more about? Did I miss an obvious DuckDB feature that would improve the somewhat realistic example? Do you think my SQL formatting is awful and prefix aliases shouldn’t exist? Let me know and I’ll see what I can do.
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Lennart Hensler
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.