I have been working as a data scientist at codecentric for several years now. Thus, my language of choice is Python and I am using it in several projects on a daily basis. Last year, I got pretty excited about the announcement of the new versions of the Apple M1 chip because it offered a much higher performance. Usually, I don’t need to run long trainings of neural networks on my laptop. But for small experiments, and of course debugging, my hope was to save a lot of time. In December last year I was privileged enough to choose a new business laptop and so I took the opportunity to get a Macbook Pro 16 with M1Pro Silicon. The whole installation started smoothly until I wanted to run my Python projects. Then I ran into …
Apple’s M1 chip is built on the ARM architecture in contrast to x64 chips used in prior Macbook versions. On the one hand, the advantage of this is that Apple became independent from Intel so they could design their own chip. The disadvantage, on the other hand, is that all software either needs to be emulated (with Rosetta2) or recompiled with an architecture-specific compiler for arm64 (M1) instead of x86 like before … Unfortunately, a recompilation is unlikely to work out of the box and code changes must be applied. Although Python is a scripting language, this also holds true for it because the interpreter is written in C. Furthermore, a lot of major packages like Numpy and Pandas are using C/C++ extensions for getting better performance, too. In short: With
pip install I was not able to get a running environment with all necessary packages installed. There is the alternative way of using Miniforge which is a variant of Conda. But this approach lacks the possibility of reproducible environments and therefore was not an option.
In a Python project with multiple people involved it is crucial that the software environment is consistent across different platforms and systems. A way to achieve this is using a package manager like Poetry (https://python-poetry.org/ ) for Python. It stores all package dependencies and their exact versions in files tracked with Git, which makes it possible to rerun the installation everywhere generating the same environment. Besides the reproducibility of the environment, another mandatory feature is the possibility to debug code in an easy way. Everyone who develops on a regular basis knows how much time it can save to investigate the variables and behavior step by step in an interactive manner. And since this is already possible with IDEs like PyCharm or VsCode, I didn’t want to miss out on this feature when changing to a new architecture.
What didn’t work
The first thing I tried was installing Miniforge and running
poetry install, a way of installing the specified dependencies into a virtualenv. Every time a package couldn’t be installed via pip (which Poetry is using in the background), I tried to install it with
conda install. This soon became very complicated and due to package version restrictions I abandoned this approach. The second attempt I gave a try was running the terminal in an x64 emulation with Rosetta2. The idea was to use this to install only Python x64 packages. Unfortunately, I didn’t find out how to set the compiler correctly and for me it was intransparent what compilers and which version of Homebrew was used. Thus, not seeing any way to succeed here, I searched for a different way.
The final approach I tried and which led to success was using Docker and running the required environments as containers. Each container is also emulated as x64, which makes it possible to install every package as before on prior Macbooks. All commands and steps are provided in the Git repository: https://github.com/JohnDenis/py-poetry-m1 . For abbreviation purposes I will only use Makefile targets for the description. They will be replaced with the corresponding command in the Makefile. You find it at the end of this post or in the Github repository. Additionally, Docker-Desktop must be installed to get everything running.
- As a first step, you need to run a container from your desired Python base image (here 3.8)
- Now you are in a shell of the container and can install whatever is necessary. Additionally, the project directory is mounted in the container, which allows for changes via Poetry to be stored directly in the correct pyproject.toml and poetry.lock files. For the first installation a script is provided which is triggered by
- After the initial installation, your goal is to persist the current container as an image to use it for every run of your code. Open a second terminal without stopping the container. Run
make commit_rawin the second terminal.
- Now it’s possible to use the m1-built:latest image in combination with your favorite IDE to run and debug your Python scripts. Or you can run them from plain shell by running a container with a specific command.
Updating the environment
In some projects it occurs rarely,in others it occurs more often: the need to change packages of your environment. To save time, you don’t want to start from scratch and reinstall all packages with every change coming. This is why this solution presents a way of changing the built Docker image:
- Start a Docker container of m1-built:latest by running
- Interact with the Python environment as desired (e.g.
- Open a second terminal without stopping the current container. Run make
commit_rawin the second terminal.
- Now you have a new version stored at the m1-built:latest tag.
Flexibility of the solution
This solution does not only work for Poetry-based environments, but also for all environments where packages are installed via pip. This means, the version of your python environment can be selected in the docker-compose.yml (e.g. 3.8, 3.9, etc) and everything else can be applied via command line when connected to the container.
Debugging with PyCharm
With the Docker image m1-built:latest being committed, PyCharm offers a convenient way to run and debug your project scripts.
- Add a new interpreter (Project Settings -> Python Interpreter)
- Add a Path mapping from your project root to
- Add Run configuration for main.py file
With these steps you create a run configuration which is working in the same way as normal environments.
Remarks on TensorFlow
The only package where the original Pypi packages are not working is TensorFlow. The reason seems to be that the AVX speed-up options cannot be emulated. Unfortunately, the workaround isn’t as clean as the plain solution, but currently I don’t know a different one:
- Follow the instructions above and create your environment in a container.
- Compile or download a version of TensorFlow where the AVX instructions are deactivated.
- Replace the original installations with
pip install /path/to/tensorflow_wheel.
- Go ahead and commit the container.
As long as not all Python packages are compatible with the Apple M1 silicon, this solution gives you a great way to run any Python environment on your Apple computer. And even if it’s possible to install all packages via pip, I would recommend sticking to this approach because it makes the software stack encapsulated and reproducible which saves a lot of time and nerves in the long run. One additional enhancement can be building the Docker image in a CI-pipeline. Thus, not every team member needs to conduct the described steps, but you can use the image from your private Pypi repository immediately.
Your job at codecentric?
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.