Automating physical regression testing of products with computer vision and robotics
Testing a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive for a higher degree of automation.
We have worked on a fully integrated approach to perform physical tests of a device and funnel the feedback right back to the development team. This is done in a way that integrates performance and quality-oriented tests directly with the build pipeline. Regression testing of physical devices with new software can be quite painful since you need to ‘walk’ through all functionalities and the tester needs to be ‘awake’ 🙂 to spot even minor deviations. Furthermore, they and their peers need to have a good and common understanding of the desired behavior.
Thermomix as an example case for physical testing
Fig1. Thermomix – Source: Vorwerk
The one aspect we focus on in this article is the upgrade or boot process. Changes to the software can have a significant impact on the duration of the update/boot process, e.g. by changing the times for downloading the image, writing a data blob to flash memory or upgrading drivers. Depending on tasks, the upgrade might have an impact on the user interface and therefore on the customer experience. For example, if the screen-driver is disabled during reboot or flashing, the screen and status LEDs become black. If this state takes too long, the customer could mistakenly think that the update failed and maybe even unplug the device during this critical phase – according to the motto “Have you tried turning it off and on again? ” 😉 .
Since this device manufacturer is focused on optimizing the customer experience, conditions like this should be avoided altogether.
The interface is fully tested during the development process, but the testing for delivered devices is a challenge. Due to the fact that the images of the embedded devices are encrypted, production devices cannot be easily debugged. Devices cannot be downgraded for security reasons and you need to deploy the software through the production software delivery stack. So far the update needs to be walked though by hand and be checked upon deployment.
Goals for an AI driven physical regression testing process
In order to have a stable repeatable regression testing process of physical devices, we have defined specific aspects and functionalities that we want to achieve:
- identification of upgrade/boot-process phase
- generation of upgrade timing diagrams
- acquisition of test videos
- augmentation of test videos with boot phase ID
- integration of results into the build pipeline
- comparison of test with development KPIs
- integration of an actor to physically click through the process
As a side requirement, we wanted to build a fully serverless setup on the AWS stack .
Using Deep Learning to detect update and boot phases
At the core of the test-chain is the understanding of screen contents and behavior of the device. We used Deep Learning algorithms to classify the current screen contents. In order to get started with the idea as fast as possible, we used existing videos to train our classifier. We used a randomized sample of the video input, showing update processes, as a training set of annotated frames to train the classifier. The different boot/update phases are divided into eight different classes
Fig.2 Screen outputs during the update process map to classes
The resulting model was deployed as an AWS Sagemaker inference endpoint that receives input which is then classified and gives us the identified boot/update phase back. As of now, the approach uses a Convolutional Neural Network (CNN – you can find follow-up reading in our blog ), more specifically ResNet . We’ve deployed the inference point on a ml.p2.xlarge instance, which is one of the smaller GPU accelerated instances. The selected instance size was sufficient to work through the videos near real time.
Fig.3 Sagemaker Classification and resulting artifacts
A perspective for further development is to detect input fields, buttons and output in order to navigate dynamically through known (regression tests) or unknown (exploratory tests) interfaces. The acquired position information of GUI components is used as a coordinate input that is translated through a translation matrix into coordinates for the robot arm that is interfacing the device being tested. Furthermore, it is possible to build a closed feedback loop validation of the inputs that have been dialed in using the input wheel.
Fig.4 Localization of GUI elements by object detection
As a result, we generate frame-exact timing diagrams that show the exact update flow and the duration of the phases. The fully annotated videos of the tests are stored in order to be able to view the details of the test later on. All test artifacts are accessible through the build pipeline output.
Fig.5 Example of a frame-exact classification sequence
Fig.6 The black screen
Making development performance measurable – KPIs for the build pipeline
The key aspect to improve the customer experience in this case is to quantify the impact of software and make the changes visible. In this case, potential KPIs would include overall update time, black-screen time as well as download & inflation time. The alerting is based on thresholds leading to errors or warnings if values approach the hard limits.
The KPIs should be accessible within the build pipeline. One of the most popular tools for managing the build pipeline is Jenkins – https://Jenkins.io . The end-to-end tester could be integrated into Jenkins as a plugin, giving you access to the current build and all former builds. Each result for a deployment of a build is summarized on the highest level to a weather level – OK, warning or error. You can drill down into the measurements for each phase and even have a look at timing diagrams or watch the test video. The actual detail view is included as an iframe into the Jenkins dashboard.
Fig.7 Integration into the build pipeline
End-to-end testing architecture
While setting up the architecture, we’ve focussed on making it entirely serverless. The process starts in Jenkins, which completes build and deployment e.g. in the test environment. Afterwards, Jenkins calls an API gateway with details like BUILD_ID or BUILD_DISPLAY_NAME. This triggers a Lambda function that creates deployment and test metadata in DynamoDB – the whole identification and metadata of the end-to-end testing process is handled here. The actual test process automation is handled by a step function that manages the workflow through the process: starting video acquisition, pressing the update button, starting the inference of the video material, updating the metadata, … The derived artifacts are stored in S3 as static content. Dynamic content is added by using a Lambda function. The web interface is embedded in the Jenkins dashboard.
There are alternatives to storing the videos on S3 and to waiting for completion. The disadvantage of this architecture is that you record fixed-length video snippets, due to the fact that you do the inference in the cloud and you don’t know when to stop. One alternative would be to bring the classification to the Deep Lens (which is currently not so fast). Another way would be to use Kinesis Video Streams (example architectural building block ) and do the classification in real time in the cloud. There might also be the possibility of triggering a Sagemaker batch job for classification directly from the step function.
Conclusion and other fields of application for physical regression testing
The end-to-end testing process enables you to check if your product actually behaves as intended or not. This can be used for regression and generic quality assurance tests of physical products. In our case the behavior of the physical product is immediately linked with the software development process, by introducing KPIs, warnings and errors that nudge developers towards keeping track of the impact of their changes to the software. Concrete ideas how to apply the end-to- tester would be quality testing of cars before delivery – Is every switch working? In the future, automatic cabling configuration could be also an interesting area in which the end-to-end tester could improve manufacturing speeds. In combination with the robotic actor, the computer vision algorithm could also be used to perform much simpler durability tests, as you might know it from IKEA , that are dynamically reacting to the test object while monitoring the condition of the test subject.
If you have an idea in what way physical regression testing could be applicable or you just want to discuss our approach – please reach out to me via mail or Twitter (@kherings). If you want to get more into Deep Learning, I recommend checking out our Deep Learning Bootcamp or our YouTube channel .
Image sources: Vorwerk, Youtube, AWS, DLR (CC-BY 3.0).
Your job at codecentric?
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.