Popular searches
//

Querying Databricks Delta Tables in Motherduck

25.4.2025 | 3 minutes reading time

Intro

In a previous  article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog.

Building upon that, a next step is to investigate the integration of Databricks and Motherduck. This exploration will determine the feasibility and practicality of leveraging Motherduck for querying data stored in the Databricks Unity Catalog, which in this case is set up in Azure, where Databricks is just another easy to set up service.

Setting up the access

The uc_catalog extension for duckdb allows attaching the Unity Catalog as a database in Motherduck. To access the individual Delta Tables inside the Unity Catalog, the delta and the azure extension are required.

A Personal Access Token (PAT) and the workspace URL for Databricks must be provided so that the uc_catalog extension can access the Unity Catalog. The PAT is used in the Authorization header for calls to the Databricks API. Afterwards, the Unity Catalog can be attached like a regular database in Motherduck.

Access to the Delta Tables inside the Unity Catalog is granted via temporary table credentials to allow short-term external data access. In this context, an Azure SAS Token will be generated and utilized as the credential for Databricks. The extension does this automatically for S3 but not for Azure.

To obtain the token manually, the /api/2.0/unity-catalog/temporary-table-credentials endpoint of the Databricks API must be used. Obtaining the SAS token requires the PAT, the table UUID and the allowed operation (either READ or READ|WRITE). The python SDK also supports this feature. The generated token is based upon the Databricks environment and is tied to the table. It is mandatory that the EXTERNAL USE SCHEMA permission is granted on Catalog level.

Delta Tables in Motherduck

Data can be loaded into Motherduck for further processing by following a sequence of steps. The local duckdb UI was used to connect to the Motherduck instance for this example. We attach to the samples catalog that can be found in any Databricks workspace.

Connect to motherduck

1ATTACH 'md:';

Add required extensions

1INSTALL uc_catalog;  
2LOAD uc_catalog;  
3LOAD delta;

Create Unity Catalog secret and Azure secret, attach Unity Catalog as database

1SET azure_transport_option_type = 'curl';
2CREATE OR REPLACE SECRET uc_catalog (
3    TYPE UC,
4    TOKEN '<personal access token>,
5    ENDPOINT '<databricks instance base url>'
6);
7CREATE OR REPLACE SECRET azure (
8    TYPE AZURE,
9    CONNECTION_STRING 'AccountName=<Azure storage account name>;SharedAccessSignature=<SAS token>'
10);
11ATTACH 'samples' AS samples (TYPE UC_CATALOG);

The Azure storage account name is part of the URL that is returned from the Databricks API call. It is the part that is between the @ symbol and the dfs.core.windows.net part. It can also be found in the details view for a Catalog item in the Databricks UI.

Check access

1SHOW ALL TABLES;

Following these steps, SHOW ALL TABLES should display the Delta Tables from Databricks within the Motherduck UI. Query execution is enabled on the accessible table via the Azure secret. Multiple Delta Tables would require the creation of multiple Azure secrets to store the SAS tokens for each table. They can be scoped to use them for data access to the corresponding Delta Tables.

Ideally, if the uc_catalog supported handling Azure credentials natively, the temporary table credentials could be refreshed directly by the extension, making the separate Azure secret obsolete.

Outlook

While initial access to Delta Tables can be established relatively easily, providing a potentially faster alternative for specific analytics tasks, this approach has obvious limitations. The necessity for frequent renewal of short-lived temporary table credentials limits the usability of Motherduck for Databricks. Full support for S3 is already available. GCP and Azure support would require adaptations of the credential handling inside the extension and are open issues on GitHub.

Motherduck_Databricks_Integration.png

In the end, the uc_catalog extension is still a Proof of Concept. I hope the open Azure authorization issue will be addressed to revisit this approach. Then the extension could offer a benefit to load data from Databricks into Motherduck with a small initial overhead. At the moment, the integration requires glue code for the credential handling that could become obsolete and a plain SQL solution could be implemented.

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.