Physical regression testing for the Thermomix

31.3.2020 | 8 minutes reading time

Automating physical regression testing of products with computer vision and robotics

Testing a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive for a higher degree of automation.

We have worked on a fully integrated approach to perform physical tests of a device and funnel the feedback right back to the development team. This is done in a way that integrates performance and quality-oriented tests directly with the build pipeline. Regression testing of physical devices with new software can be quite painful since you need to ‘walk’ through all functionalities and the tester needs to be ‘awake’ 🙂 to spot even minor deviations. Furthermore, they and their peers need to have a good and common understanding of the desired behavior.

You will find further related content on our landing page codecentric.ai , in our YouTube channel and our free Deep Learning Bootcamp.

Thermomix as an example case for physical testing

The example case at hand is a scenario based on the Vorwerk Thermomix multifunctional food processor. The universal cooking machine at hand receives updates over the air on a regular basis.

Fig1. Thermomix – Source: Vorwerk

The one aspect we focus on in this article is the upgrade or boot process. Changes to the software can have a significant impact on the duration of the update/boot process, e.g. by changing the times for downloading the image, writing a data blob to flash memory or upgrading drivers. Depending on tasks, the upgrade might have an impact on the user interface and therefore on the customer experience. For example, if the screen-driver is disabled during reboot or flashing, the screen and status LEDs become black. If this state takes too long, the customer could mistakenly think that the update failed and maybe even unplug the device during this critical phase – according to the motto “Have you tried turning it off and on again? ” 😉 .

Since this device manufacturer is focused on optimizing the customer experience, conditions like this should be avoided altogether.

The interface is fully tested during the development process, but the testing for delivered devices is a challenge. Due to the fact that the images of the embedded devices are encrypted, production devices cannot be easily debugged. Devices cannot be downgraded for security reasons and you need to deploy the software through the production software delivery stack. So far the update needs to be walked though by hand and be checked upon deployment.

Goals for an AI driven physical regression testing process

In order to have a stable repeatable regression testing process of physical devices, we have defined specific aspects and functionalities that we want to achieve:

identification of upgrade/boot-process phase
generation of upgrade timing diagrams
acquisition of test videos
augmentation of test videos with boot phase ID
integration of results into the build pipeline
comparison of test with development KPIs
integration of an actor to physically click through the process

As a side requirement, we wanted to build a fully serverless setup on the AWS stack .

Using Deep Learning to detect update and boot phases

At the core of the test-chain is the understanding of screen contents and behavior of the device. We used Deep Learning algorithms to classify the current screen contents. In order to get started with the idea as fast as possible, we used existing videos to train our classifier. We used a randomized sample of the video input, showing update processes, as a training set of annotated frames to train the classifier. The different boot/update phases are divided into eight different classes

Fig.2 Screen outputs during the update process map to classes

The resulting model was deployed as an AWS Sagemaker inference endpoint that receives input which is then classified and gives us the identified boot/update phase back. As of now, the approach uses a Convolutional Neural Network (CNN – you can find follow-up reading in our blog ), more specifically ResNet. We’ve deployed the inference point on a ml.p2.xlarge instance, which is one of the smaller GPU accelerated instances. The selected instance size was sufficient to work through the videos near real time.

Fig.3 Sagemaker Classification and resulting artifacts

A perspective for further development is to detect input fields, buttons and output in order to navigate dynamically through known (regression tests) or unknown (exploratory tests) interfaces. The acquired position information of GUI components is used as a coordinate input that is translated through a translation matrix into coordinates for the robot arm that is interfacing the device being tested. Furthermore, it is possible to build a closed feedback loop validation of the inputs that have been dialed in using the input wheel.

Fig.4 Localization of GUI elements by object detection

As a result, we generate frame-exact timing diagrams that show the exact update flow and the duration of the phases. The fully annotated videos of the tests are stored in order to be able to view the details of the test later on. All test artifacts are accessible through the build pipeline output.

Fig.5 Example of a frame-exact classification sequence

Fig.6 The black screen

Making development performance measurable – KPIs for the build pipeline

The key aspect to improve the customer experience in this case is to quantify the impact of software and make the changes visible. In this case, potential KPIs would include overall update time, black-screen time as well as download & inflation time. The alerting is based on thresholds leading to errors or warnings if values approach the hard limits.

The KPIs should be accessible within the build pipeline. One of the most popular tools for managing the build pipeline is Jenkins – https://Jenkins.io . The end-to-end tester could be integrated into Jenkins as a plugin, giving you access to the current build and all former builds. Each result for a deployment of a build is summarized on the highest level to a weather level – OK, warning or error. You can drill down into the measurements for each phase and even have a look at timing diagrams or watch the test video. The actual detail view is included as an iframe into the Jenkins dashboard.

Fig.7 Integration into the build pipeline

End-to-end testing architecture

While setting up the architecture, we’ve focussed on making it entirely serverless. The process starts in Jenkins, which completes build and deployment e.g. in the test environment. Afterwards, Jenkins calls an API gateway with details like BUILD_ID or BUILD_DISPLAY_NAME. This triggers a Lambda function that creates deployment and test metadata in DynamoDB – the whole identification and metadata of the end-to-end testing process is handled here. The actual test process automation is handled by a step function that manages the workflow through the process: starting video acquisition, pressing the update button, starting the inference of the video material, updating the metadata, … The derived artifacts are stored in S3 as static content. Dynamic content is added by using a Lambda function. The web interface is embedded in the Jenkins dashboard.

Fig.8 Architecture

Architectural alternatives

There are alternatives to storing the videos on S3 and to waiting for completion. The disadvantage of this architecture is that you record fixed-length video snippets, due to the fact that you do the inference in the cloud and you don’t know when to stop. One alternative would be to bring the classification to the Deep Lens (which is currently not so fast). Another way would be to use Kinesis Video Streams (example architectural building block ) and do the classification in real time in the cloud. There might also be the possibility of triggering a Sagemaker batch job for classification directly from the step function.

Conclusion and other fields of application for physical regression testing

The end-to-end testing process enables you to check if your product actually behaves as intended or not. This can be used for regression and generic quality assurance tests of physical products. In our case the behavior of the physical product is immediately linked with the software development process, by introducing KPIs, warnings and errors that nudge developers towards keeping track of the impact of their changes to the software. Concrete ideas how to apply the end-to- tester would be quality testing of cars before delivery – Is every switch working? In the future, automatic cabling configuration could be also an interesting area in which the end-to-end tester could improve manufacturing speeds. In combination with the robotic actor, the computer vision algorithm could also be used to perform much simpler durability tests, as you might know it from IKEA , that are dynamically reacting to the test object while monitoring the condition of the test subject.

If you have an idea in what way physical regression testing could be applicable or you just want to discuss our approach – please reach out to me via mail or Twitter (@kherings). If you want to get more into Deep Learning, I recommend checking out our Deep Learning Bootcamp or our YouTube channel .

Image sources: Vorwerk, Youtube, AWS, DLR (CC-BY 3.0).

Was this post helpful?

Blog author

Kai Herings

Do you still have questions? Just send me a message.

5 reasons we developers misjudge agentic software engineering

Throughout 2025 a kind of trench warfare raged between software developers on the pro and anti-AI development camps. We are, by definition, the experts on software creation. Ironically, this also makes us highly biased, and is exactly the reason you ...

Generative AI
AI

8.1.2026 | 5 minutes reading time

John Fletcher

The developer's dilemma - mastering the transition to AI engineering

Dear software developer, please choose one of the following options for 2026 and beyond:a) finding yourself with obsolete skills, and eventually, unemployed. b) salary increases lower than inflation, whilst expectations of your output continually increase...

AI
Generative AI

1.1.2026 | 11 minutes reading time

John Fletcher

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

On January 27, 2025, the technology stock exchange experienced an unexpected crash: The NVIDIA stock price plummeted by over 17%, temporarily wiping out nearly $600 billion in market value and setting a new historical record in the stock market. Many...

AI
Generative AI
LLM

29.1.2025 | 8 minutes reading time

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 minutes reading time

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 minutes reading time

Danny Keller

Simplifying LLM Application Development: A Newcomer's Perspective

I. Introduction Large Language Models (LLMs) have become highly popular due to their transformative impact on various fields, especially within IT. They enable developers to create innovative software applications centered around AI interactions, offering...

Generative AI
AI

6.12.2024 | 13 minutes reading time

We deployed our SaaS Application on fly.io (and it was great).

How we deployed our application in a fraction of the time while saving 100% of the cost. Our team, a bunch of experienced software engineers without prior contact to cloud deployments, wanted to deploy our OCPP-compliant EV Charging Station Simulator...

AWS
Cloud

23.10.2024 | 4 minutes reading time

Jannis Mainczyk

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the third and last one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first and second article)The previous articles focused on (i) Microcks’ ...

Testing
API

23.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the second one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first article)While the previous article concentrated on Microcks’ architecture,...

API
Testing

16.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Key TakeawaysAPI mocking used, e.g., for integration testing, is challenging as it assumes conformance to mocked API functionality, which can incur significant costs as mock complexity increases with API complexityDefinition-based API mocking can reduce...

API
Testing

9.10.2024 | 9 minutes reading time

Dr. Florian Rademacher

Function Calling with GPT Models

GenAI is a powerful tool for generating content and interacting with applications using natural language. However, this tool also has significant limitations when you plan to use it in your own software. GenAI's knowledge is limited to information that...

Generative AI
AI
LLM

6.9.2024 | 5 minutes reading time

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 minutes reading time

Markus Höfer

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 minutes reading time

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 minutes reading time

Dr. Florian Rademacher

Playwright tests and API Mocking

Problem definition Playwright tests can sometimes depend on external services such as APIs, which might happen to be unavailable at times. In this case there are several options for executing these tests adequately, as described below. Actually call ...

Testing

10.5.2024 | 4 minutes reading time

Ege Inanc

Charge your APIs Volume 25: Contract Testing

I feel the way we do integration testing is sort of like setting your house on fire to test your smoke alarm. It is excessive, tiresome and way too costly. This is not a quote from myself. I typically don't come up with such good ideas when I need....

Testing
Software development
API

2.4.2024 | 11 minutes reading time

Pasquale Brunelli

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 minutes reading time

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 minutes reading time

Daniel Kocot

Answer questions about your documents with OpenAI and Pinecone

In recent years, large language models (LLMs) have made remarkable progress in interacting with humans, showcasing their ability to answer a wide array of questions. Trained on publicly accessible internet content, these models have broad knowledge across...

13.11.2023 | 12 minutes reading time

Lukas Lehmann

Physical regression testing for the Thermomix

Automating physical regression testing of products with computer vision and robotics

Thermomix as an example case for physical testing

Goals for an AI driven physical regression testing process

Using Deep Learning to detect update and boot phases

Making development performance measurable – KPIs for the build pipeline

End-to-end testing architecture

Architectural alternatives

Conclusion and other fields of application for physical regression testing

Was this post helpful?

Blog author

More articles in this subject area

5 reasons we developers misjudge agentic software engineering

The developer's dilemma - mastering the transition to AI engineering

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

How we can hack an AI with just a few words

Hexagonal Architecture is just an island

Simplifying LLM Application Development: A Newcomer's Perspective

We deployed our SaaS Application on fly.io (and it was great).

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Function Calling with GPT Models

Dangling DNS in cloud infrastructures

Spring Boot and HTMX: Deployment to AWS Lambda

Becoming a Data-Driven Company with Applied Data Products

Playwright tests and API Mocking

Charge your APIs Volume 25: Contract Testing

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Answer questions about your documents with OpenAI and Pinecone