Python on an M1 chip: Running smoothly using Docker

14.2.2022 | 6 minutes reading time

I have been working as a data scientist at codecentric for several years now. Thus, my language of choice is Python and I am using it in several projects on a daily basis. Last year, I got pretty excited about the announcement of the new versions of the Apple M1 chip because it offered a much higher performance. Usually, I don’t need to run long trainings of neural networks on my laptop. But for small experiments, and of course debugging, my hope was to save a lot of time. In December last year I was privileged enough to choose a new business laptop and so I took the opportunity to get a Macbook Pro 16 with M1Pro Silicon. The whole installation started smoothly until I wanted to run my Python projects. Then I ran into …

Problems

Apple’s M1 chip is built on the ARM architecture in contrast to x64 chips used in prior Macbook versions. On the one hand, the advantage of this is that Apple became independent from Intel so they could design their own chip. The disadvantage, on the other hand, is that all software either needs to be emulated (with Rosetta2) or recompiled with an architecture-specific compiler for arm64 (M1) instead of x86 like before … Unfortunately, a recompilation is unlikely to work out of the box and code changes must be applied. Although Python is a scripting language, this also holds true for it because the interpreter is written in C. Furthermore, a lot of major packages like Numpy and Pandas are using C/C++ extensions for getting better performance, too. In short: With pip install I was not able to get a running environment with all necessary packages installed. There is the alternative way of using Miniforge which is a variant of Conda. But this approach lacks the possibility of reproducible environments and therefore was not an option.

Goal

In a Python project with multiple people involved it is crucial that the software environment is consistent across different platforms and systems. A way to achieve this is using a package manager like Poetry (https://python-poetry.org/ ) for Python. It stores all package dependencies and their exact versions in files tracked with Git, which makes it possible to rerun the installation everywhere generating the same environment. Besides the reproducibility of the environment, another mandatory feature is the possibility to debug code in an easy way. Everyone who develops on a regular basis knows how much time it can save to investigate the variables and behavior step by step in an interactive manner. And since this is already possible with IDEs like PyCharm or VsCode, I didn’t want to miss out on this feature when changing to a new architecture.

What didn’t work

The first thing I tried was installing Miniforge and running poetry install, a way of installing the specified dependencies into a virtualenv. Every time a package couldn’t be installed via pip (which Poetry is using in the background), I tried to install it with conda install. This soon became very complicated and due to package version restrictions I abandoned this approach. The second attempt I gave a try was running the terminal in an x64 emulation with Rosetta2. The idea was to use this to install only Python x64 packages. Unfortunately, I didn’t find out how to set the compiler correctly and for me it was intransparent what compilers and which version of Homebrew was used. Thus, not seeing any way to succeed here, I searched for a different way.

Final approach

The final approach I tried and which led to success was using Docker and running the required environments as containers. Each container is also emulated as x64, which makes it possible to install every package as before on prior Macbooks. All commands and steps are provided in the Git repository: https://github.com/JohnDenis/py-poetry-m1 . For abbreviation purposes I will only use Makefile targets for the description. They will be replaced with the corresponding command in the Makefile. You find it at the end of this post or in the Github repository. Additionally, Docker-Desktop must be installed to get everything running.

As a first step, you need to run a container from your desired Python base image (here 3.8) make run_raw
Now you are in a shell of the container and can install whatever is necessary. Additionally, the project directory is mounted in the container, which allows for changes via Poetry to be stored directly in the correct pyproject.toml and poetry.lock files. For the first installation a script is provided which is triggered by make install.
After the initial installation, your goal is to persist the current container as an image to use it for every run of your code. Open a second terminal without stopping the container. Run make commit_raw in the second terminal.
Now it’s possible to use the m1-built:latest image in combination with your favorite IDE to run and debug your Python scripts. Or you can run them from plain shell by running a container with a specific command.

Workflow of how docker images and containers are created.

Updating the environment

In some projects it occurs rarely,in others it occurs more often: the need to change packages of your environment. To save time, you don’t want to start from scratch and reinstall all packages with every change coming. This is why this solution presents a way of changing the built Docker image:

Start a Docker container of m1-built:latest by running make run_built
Interact with the Python environment as desired (e.g. poetry add, poetry update)
Open a second terminal without stopping the current container. Run make commit_raw in the second terminal.
Now you have a new version stored at the m1-built:latest tag.

Flexibility of the solution

This solution does not only work for Poetry-based environments, but also for all environments where packages are installed via pip. This means, the version of your python environment can be selected in the docker-compose.yml (e.g. 3.8, 3.9, etc) and everything else can be applied via command line when connected to the container.

Debugging with PyCharm

With the Docker image m1-built:latest being committed, PyCharm offers a convenient way to run and debug your project scripts.

Add a new interpreter (Project Settings -> Python Interpreter)
Add a Path mapping from your project root to /opt/project
Add Run configuration for main.py file

With these steps you create a run configuration which is working in the same way as normal environments.

python on m1 chip python on m1 chip docker interpreter docker config

Remarks on TensorFlow

The only package where the original Pypi packages are not working is TensorFlow. The reason seems to be that the AVX speed-up options cannot be emulated. Unfortunately, the workaround isn’t as clean as the plain solution, but currently I don’t know a different one:

Follow the instructions above and create your environment in a container.
Compile or download a version of TensorFlow where the AVX instructions are deactivated.
Replace the original installations with pip install /path/to/tensorflow_wheel.
Go ahead and commit the container.

Outlook

As long as not all Python packages are compatible with the Apple M1 silicon, this solution gives you a great way to run any Python environment on your Apple computer. And even if it’s possible to install all packages via pip, I would recommend sticking to this approach because it makes the software stack encapsulated and reproducible which saves a lot of time and nerves in the long run. One additional enhancement can be building the Docker image in a CI-pipeline. Thus, not every team member needs to conduct the described steps, but you can use the image from your private Pypi repository immediately.

Was this post helpful?

Blog author

Denis Stalz-John

Machine Learning Specialist

Do you still have questions? Just send me a message.

fromDenis Stalz-John

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 minutes reading time

Denis Stalz-John

Why user-oriented development is so important – the story of tactics.ai

In this blog post, we want to give you an insight into the product development of tactics.ai. Our initial idea was a data-driven football analysis tool that applies machine learning techniques to analyze the strengths and weaknesses of opponents and ...

Agile
AI
Startup
Machine Learning
Product management

23.8.2020 | 8 minutes reading time

Denis Stalz-John

Leonie Günther

How to define AI? – Using the Turing Test to measure human-like intelligence...

Although everyone has an intuitive way of understanding what AI means, the term is somehow difficult to grasp in its whole complexity. One example of a definition is given by Kaplan, A. and M. Haenlein (2019), who characterize AI as “a system’s ability...

AI
Testing

20.6.2019 | 3 minutes reading time

Denis Stalz-John

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Using Dagster with DuckDB

DuckDB has rapidly emerged as a popular in-process analytics database. Dagster, on the other hand, is a modern data orchestration framework that makes it easy to build and manage data pipelines. Combining Dagster with DuckDB allows data engineers to ...

Data

16.5.2025 | 4 minutes reading time

Hendrik Kamp

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 minutes reading time

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 minutes reading time

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 minutes reading time

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 minutes reading time

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 minutes reading time

Dr. Florian Rademacher

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 minutes reading time

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 minutes reading time

Daniel Kocot

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

In the ever-evolving landscape of software development, buzzwords and paradigms come and go. One such term that has gained significant traction in recent years is "API-First Development." It's been hailed as the holy grail of modern software engineering...

API
Data

19.10.2023 | 5 minutes reading time

Daniel Kocot

An introduction to federated learning in an industrial context: Advanced

In the Machine Learning space, it was long believed that sharing learnings or weights was safe in the sense that the input data couldn't be extracted. However, this belief has been challenged by researchers coming out over the years. Nowadays, numerous...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 9 minutes reading time

An introduction to federated learning in an industrial context: Fundamentals

With the help of data, companies are able to make more informed decisions, optimize their workflows and gain an edge in the competitive world of business using the power of Machine Learning (ML). However, handling data has become increasingly difficult...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 8 minutes reading time

Charge your APIs Volume 13: Data meets APIOps

In the swirling digital vortex that modern businesses navigate, two things stand clear as day: our escalating reliance on Application Programming Interfaces (APIs) and the immeasurable value of data. The API Operations (APIOps) pipeline, with its automated...

API
Data

24.8.2023 | 11 minutes reading time

Daniel Kocot

Python and CDK (Part 2): Taking control of Python dependencies in AWS ...

In Part 1 of this series, Developing AWS Lambda Functions with Python and CDK, we covered the initial setup of a CDK and Python project. We walked through the process of creating a basic Hello World* Lambda function, testing it with a unit test, defining...

AWS
Serverless
Python

2.6.2023 | 2 minutes reading time

Python and CDK (Part 1): Developing AWS Lambda functions with Python and...

This blog post assumes that you are familiar with Python development and know the basic concepts of Amazon CDK. What's more, you should have an AWS account and have configured the AWS CLI. If you're new to CDK, go here, if you need to configure the AWS...

AWS
Serverless
Python

6.3.2023 | 6 minutes reading time

Simple Fraud Detection with PyMC

In one of my last projects, we were facing a prediction problem with very limited data. Each set of data took a specialist hours to compile, and results were not always successful. Therefore, we were looking for a tool to handle these requirements, as...

Python
Data Science

26.1.2023 | 7 minutes reading time

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 minutes reading time

Denis Stalz-John

Python on an M1 chip: Running smoothly using Docker

Problems

Goal

What didn’t work

Final approach

Updating the environment

Flexibility of the solution

Debugging with PyCharm

Remarks on TensorFlow

Outlook

Was this post helpful?

Blog author

More articles

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

Why user-oriented development is so important – the story of tactics.ai

How to define AI? – Using the Turing Test to measure human-like intelligence...

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Using Dagster with DuckDB

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

An introduction to federated learning in an industrial context: Advanced

An introduction to federated learning in an industrial context: Fundamentals

Charge your APIs Volume 13: Data meets APIOps

Python and CDK (Part 2): Taking control of Python dependencies in AWS ...

Python and CDK (Part 1): Developing AWS Lambda functions with Python and...

Simple Fraud Detection with PyMC

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU