Mutation Testing: Watching the Watchmen

25.1.2016 | 7 minutes reading time

You can’t do without automated (unit) tests if you want to stay on top of the ever increasing complexity of software projects. A mutation testing framework ‘watches the watchmen’ by inserting small changes into your compiled byte code and then validating your test suites against these intentional bugs. As a quality safeguard it’s much more effective than traditional source code validation. It is a even a challenging way to improve your coding skills and makes writing tests suites fun again.

As a largely self-taught programmer I am still thankful for the millennial internet boom, when a linguistics graduate could get hired with nothing more than a thirst for coding and no professional experience or relevant diplomas to show for. I like to think my coding skills have improved in the fifteen years since, starting out – like most glorified amateurs – with the anti-bureaucratic “just get it done even if it requires duct-tape” attitude of coding, where writing unit tests only gets in the way of shipping code. Read Joel Spolsky about Netscape pioneer Jamy Zawinsky . Nowadays I believe in the school of “You can have good, cheap or fast. Pick two.” I don’t do duct tape anymore, except for small home repairs.

If a job is worth doing, it is worth doing well. Good code written by fallible humans needs automated tests. Too bad it’s not really the most fun part of programming. For many who write code on a daily basis test driven development still doesn’t come natural. Plenty of lip service is paid, but we’re just not that motivated to do it properly unless we’re harassed with minimum thresholds for test coverage. Tests feel like unexciting pieces of code making sure that other code which puts a blue ball in a red box has indeed put a blue ball in a red box, and not a green ball in an orange box. A pedantic bureaucratic requirement. “I can write working code, you know. Let me just get on making customers happy. They don’t care about stupid unit tests”.

That’s a dangerous attitude. Unit tests matter. Integration tests matter even more. You should get motivated. Test cases are a yardstick for quality while you’re designing your code, not something to add when you have time to spare. Code that cannot be properly unit-tested is a likely candidate for some serious overhaul. However, full rewrites are a waste of time and money. Agile development means software is continuously thought out, written down, re-thought and re-written until the user is happy or the money runs out, whichever comes first. Adopting such a just-in-time approach means that a code base is not only being expanded with new code, but the existing code adapts to the more complex software architecture without breaking the functionality it already provides. This change is happening all the time, ideally from day one, but no later than day three. Good tests make this incremental refactoring possible. Bad tests make it impossible.

Greater minds than myself have written great books about the need for refactoring and why meaningful tests should cover your entire code basis. I shouldn’t have to convince you. Let me just add my own argument here:

If you can’t test some piece of code, I don’t trust you to understand it.

Understanding how each line of code relates to the whole is a major challenge. Here’s something to put Moore’s Law into practical perspective. I am writing this post on a fancy Macbook with 16 Gb memory. In the early eighties I saved up my precious pocket money to buy a 16 Kb memory module for my Vic 20 home computer. That’s a million times more memory in thirty years. Human short-term memory is stuck on a measly seven items and has probably been since Socrates. Yes, we have better tools and faster compilers, but we have fixed-size brains that have to deal with ever-more complex projects. The only way to comprehend these is to break them down into manageable, testable chunks.

The size of a hefty chocolate bar

How to make sure our tests are any good? “Code coverage”, I hear you say. True, that’s a useful criterion, usually expressed the percentage with which source code is covered by test suites, but coverage in itself is not enough. Test suites need to pummel your code with a wide range of sensible and wacky input parameters and assert the results. Ineffective, bogus unit tests that cover the source code and please the robot don’t really validate anything. Quantifying quality is not bad in itself, but insisting on a minimum percentage of test coverage without some assurance that your tests are actually any good only lulls you in a false sense of security, particularly if you have too many developers in your team with the duct-tape attitude I just described. If management is okay with the practise, I advise you to find another employer. If you’re okay with it, I urge you to change careers.

You can (and probably should) have dedicated testers that put the finished product through rigorous manual testing. If they are worth their money they will find something that your test suites didn’t catch. But how much nicer it would be if there were an automated way to check if your tests are any good. To test your tests, so to speak. Mutation testing does just that. Mutation testing is based on a simple assumption: if your test suites fully validate the behaviour of your program, then changing the behaviour of the program by inserting significant changes should cause at least some of your tests to fail.

For those who need a car analogy: suppose the car is your source code, the test case involves doing a three-point turn and the JUnit runner is behind the wheel. The test succeeds if the car points the other way unscathed. Mutation testing will do several evil things to your car and expects that your tests will sniff them out. It will remove the battery: did you check that the engine has started. It will skew the rear-view mirror: did you adjust it for a clear view?

The framework inserts small but significant changes (‘mutants’) in your compiled code. Examples are swapping out arithmetic and equality operators, making methods return null or just removing method calls: basically anything that leaves the outward interface intact. Your unit tests should however detect the mutation and fail, thus killing the mutant. If not, the mutant has survived, indicating that your test may be insufficient. Rather than looking at code coverage alone, the percentage of mutants killed becomes the true indicator of quality.

A very useful mutation testing framework for Java is pitest.org . It integrates well with standard build tools and has simple but effective Eclipse and IntelliJ plugins. You can get up and running in a few minutes. Pitest will output a neat HTML report that puts code coverage next to mutation coverage and has a detailed view specifying which mutants have survived, i.e. were not caught by the test suite.

Is that a useful indicator? Can’t you just fool the machine like you can with code coverage? Not really. Mutation testing simply won’t let you get away with sloppy testing. Introducing a small but significant change in a class must break at least one test, i.e. kill the mutant. Mutants are more likely to survive when coverage is poor, but I have found plenty of survived mutants in code I thought was well-tested. The report from the PIT suite neatly puts coverage next to the percentage of mutants killed, where the latter is always a lower figure.

Mutation testing forces you to take testing seriously, there’s no doubt about it. Integrating it into your daily routine sharpens your sensibility to code more defensively and write good tests by adding a competitive element to the mix. Test driven development can feel like playing a game of chess with yourself. You write some code, and then you write a test for it. It’s hardly a victory to see a green test suite. Now when mutants rear their ugly heads writing these tests becomes a lot less predictive, but also more challenging. Any lazy bum can do 100% code coverage. It’s easy. Killing all the mutants in an intricate piece of source code, now that’s a sweet tasting victory.

In a next post, I will go into more detail on how to incorporate mutation testing sensibly and productively.

Was this post helpful?

Blog author

Jasper Sprengers

Do you still have questions? Just send me a message.

fromJasper Sprengers

Elegant delegates in Kotlin

Kotlin has given us some really killer features . Some are obviously useful (null safety), while others come with a warning, like operator overloading and extension functions. One such ‘handle-with-care’ feature is the language support for delegation...

Kotlin

15.11.2017 | 5 minutes reading time

Jasper Sprengers

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

In my previous post I explained how software anti-patterns are symptoms of bad habits that can be endemic to entire teams. Today I want to talk about what is perhaps the most infamous of all: the Golden Hammer. Actually, it’s a collection of hammers...

9.10.2017 | 6 minutes reading time

Jasper Sprengers

When anti-patterns become a pattern

There are plenty of learning resources on software best practices. Sprinkled in between all the well-intended advice are warnings about common pitfalls. We could do with a lot more of these warnings and think about why we keep doing the same things wrong...

27.9.2017 | 6 minutes reading time

Jasper Sprengers

The most useless knowledge of all

There are things a programmer needs to know, no excuses. There are things you can’t possibly all remember, so it’s fine to look them up when needed. There is the business domain the software touches on that you need to know. And then there’s knowing...

10.9.2017 | 5 minutes reading time

Jasper Sprengers

Not everything that is vital is also your core business

Large software projects have many vital concerns, such as authentication and authorization. Despite the wealth of available libraries in the Java ecosystem we seem to be re-inventing the wheel far too often. Keep the focus on the core business of your...

Software architecture
Java
IT-Security

16.8.2017 | 5 minutes reading time

Jasper Sprengers

In defence of pedantic tools

Outline We aim to please the customer at short notice and always overestimate our capacity to comprehend a system as it gets more complex. That’s a recipe for technical debt. The antidote to this psychological shortfall is more team discipline in writing...

Agile methods
CI/CD

2.8.2017 | 7 minutes reading time

Jasper Sprengers

Mocks or the real thing? Tips for better unit testing

Recently I had to bone up on some of the new features in Mockito 2 and Powermock , though more out of necessity than from genuine curiosity. Powermock and Mockito 2 let you fake static methods, final classes and even constructor calls, but this has ...

Agile
Agile methods

16.7.2017 | 8 minutes reading time

Jasper Sprengers

Essentialism for developers

Essentialism – the Disciplined Pursuit of Less by Greg McKeown is a book with an essential message: much of life is irrelevant distraction and we would all be happier and more productive if we learned to strive for less, but better. I encourage you ...

2.7.2017 | 8 minutes reading time

Jasper Sprengers

The vicious circle of bad test code and how to break it

SUMMARY Compared to the great advances in programming languages and tools, the day-to-day practice of how we actually code is messier than it should be. Especially for long running and complex products building things right is just as important as building...

Agile methods
Agile
Testing

6.6.2017 | 9 minutes reading time

Jasper Sprengers

CRUD operations on Spring REST resources with Kotlin

In this practical, hands-on post I would like to share some of my experience in building REST services wih JSON and Spring(Boot) using Kotlin. All examples can be transferred to Java, and if you use the indispensable Lombok library it doesn’t even look...

Kotlin
API
Spring

2.4.2017 | 7 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices part 2

This is the second part of my earlier post about strategies for integration-testing Spring Boot applications that consist of multiple (rest) services. You can find the accompanying sample application in my gitlab account: git clone git@gitlab.com:jsprengers...

27.2.2017 | 8 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices

SUMMARY: Unit tests are a necessary condition to clean code, but today’s convention-over-configuration frameworks like Spring Boot are often used to build applications consisting of multiple services. You need some way of ensuring that the parts are ...

Testing
Microservices

13.2.2017 | 9 minutes reading time

Jasper Sprengers

Web frameworks and how to survive them

SUMMARY: Frameworks that help build the web apps of tomorrow must keep up with all powerful new technology there is on offer. At some point your application has to adapt, and that is never a painless process. You can avoid a total rewrite however if ...

Angular
Java
JavaScript
Webdevelopment

12.1.2017 | 8 minutes reading time

Jasper Sprengers

Kotlin’s killer features

SUMMARY: Kotlin is a new JVM language fully interoperable with Java bytecode. It is clearly inspired by Scala, but has a different design philosophy, a much gentler learning curve and some really helpful features like null-safe types. The Importance ...

3.4.2016 | 10 minutes reading time

Jasper Sprengers

Caching de luxe with Spring and Guava

Summary We generally don’t optimize expensive operations in code until they create a bottleneck. In some of these cases you could benefit a lot from caching such data. The Spring solution is non-intrusive, highly configurable yet easy to set up, and ...

14.3.2016 | 13 minutes reading time

Jasper Sprengers

Sensible mutation testing: don’t go on a killing spree

This is a follow-up to my earlier post about mutation testing (MT). To recap: MT helps you ensure that your unit tests are any good. The framework manipulates your compiled code by inserting small changes (mutants). It then re-runs your tests and expects...

Testing

26.2.2016 | 6 minutes reading time

Jasper Sprengers

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 minutes reading time

Danny Keller

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the third and last one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first and second article)The previous articles focused on (i) Microcks’ ...

Testing
API

23.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the second one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first article)While the previous article concentrated on Microcks’ architecture,...

API
Testing

16.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Key TakeawaysAPI mocking used, e.g., for integration testing, is challenging as it assumes conformance to mocked API functionality, which can incur significant costs as mock complexity increases with API complexityDefinition-based API mocking can reduce...

API
Testing

9.10.2024 | 9 minutes reading time

Dr. Florian Rademacher

Playwright tests and API Mocking

Problem definition Playwright tests can sometimes depend on external services such as APIs, which might happen to be unavailable at times. In this case there are several options for executing these tests adequately, as described below. Actually call ...

Testing

10.5.2024 | 4 minutes reading time

Ege Inanc

Charge your APIs Volume 25: Contract Testing

I feel the way we do integration testing is sort of like setting your house on fire to test your smoke alarm. It is excessive, tiresome and way too costly. This is not a quote from myself. I typically don't come up with such good ideas when I need....

Testing
Software development
API

2.4.2024 | 11 minutes reading time

Pasquale Brunelli

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 minutes reading time

Francesca Diana

Count your queries! Repository integration tests with Hibernate Statistics

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked....

Java
Testing
Spring
Database

7.8.2023 | 6 minutes reading time

Kevin Peters

Charge your APIs Volume 6: Perfecting Your APIOps - Harnessing the Power...

Our journey through the expansive landscape of API Operations (APIOps) has led us through various territories. We've delved into Continuous Integration and Deployment, ensuring seamless transitions from coding to production-ready APIs with minimal friction...

API
Testing
GitHub

14.6.2023 | 2 minutes reading time

Daniel Kocot

Charge your APIs Volume 4: Streamlining API Operations with Continuous...

API operations refer to the maintenance and management of APIs (Application Programming Interfaces) throughout their lifecycle. This includes everything from design and development to testing, deployment, and ongoing maintenance. Continuous Integration...

Testing
API

31.5.2023 | 6 minutes reading time

Daniel Kocot

Charge your APIs Volume 3: Optimizing API Testing with Contract Testing

API testing is a crucial part of the development process that ensures the functionality, reliability, and performance of the API. Testing helps to identify and resolve errors early on, which translates into reduced development costs and improved customer...

API
Testing

24.5.2023 | 6 minutes reading time

Daniel Kocot

JavaScript test performance: getting the best out of Jest

In recent years Jest has established itself as the go-to testing framework for JavaScript and TypeScript development. It provides a complete toolkit (test runner, assertion library, mocking library, code coverage and more) out of the box, and requires...

Node.js
JavaScript
APM
Testing

12.11.2021 | 7 minutes reading time

APIOps – Automated processes for even better APIs

In my German Softwerker article (Vol. 14, p. 90) , I already dealt with the continuous design and development cycle of APIs. This was mainly about basic assumptions and tooling, including the introduction of API gateways or platforms into existing development...

DevOps
Cloud
Testing
API

28.1.2021 | 8 minutes reading time

Daniel Kocot

Green test pyramids with Cypress – UI testing of the future

Cypress is a young open-source testing framework for web-based user interfaces (UI). Cypress tests are written in JavaScript and, as is also common with Selenium-based technologies, are based on the Document Object Model (DOM) of the HTML of a web application...

Frontend
JavaScript
Testing

29.9.2020 | 8 minutes reading time

Detox vs. Appium – a comparison of React Native testing frameworks

Currently, there are especially two end-to-end testing frameworks which are interesting for React Native developers: Detox and Appium. During my internship at codecentric, I analyzed and compared both frameworks in detail, writing tests with both frameworks...

React
Testing

16.7.2020 | 5 minutes reading time

Anja Bender

Code-based remote API mocking with Typescript and Webpack

IntroductionRemember how you have to set up a whole bunch of infrastructure locally just to be able to independently work on the frontend part of your Jamstack project? If you’re tired of doing so, maybe this short write-up on how to replace the infrastructure...

Frontend
Testing
JavaScript

13.7.2020 | 5 minutes reading time

Implementing and testing an Angular feature flag directive

IntroductionAn important goal of agile software development is to shorten the user feedback loop. To achieve that you want to release your changes as often as possible. This also includes releasing prototypes, e.g. to a smaller audience, gathering customer...

Frontend
Angular
JavaScript
Testing
Webdevelopment

18.5.2020 | 6 minutes reading time

Physical regression testing for the Thermomix

Automating physical regression testing of products with computer vision and roboticsTesting a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive...

AWS
IoT
Computer Vision
Product management
AI
Testing

31.3.2020 | 8 minutes reading time

We did our homework – what are the next steps? – Part 4

First: the most important step for a company is to identify the user’s pain points or particular frustration, rather than focussing on the amount of features you think are good for the user to have. Take a moment and rethink those decisions based on ...

Startup
Agile
Agile transformation
Product management
Agile methods
Testing
UX/UI

16.3.2020 | 6 minutes reading time

Mutation Testing: Watching the Watchmen

Was this post helpful?

Blog author

More articles

Elegant delegates in Kotlin

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

When anti-patterns become a pattern

The most useless knowledge of all

Not everything that is vital is also your core business

In defence of pedantic tools

Mocks or the real thing? Tips for better unit testing

Essentialism for developers

The vicious circle of bad test code and how to break it

CRUD operations on Spring REST resources with Kotlin

Integration testing strategies for Spring Boot microservices part 2

Integration testing strategies for Spring Boot microservices

Web frameworks and how to survive them

Kotlin’s killer features

Caching de luxe with Spring and Guava

Sensible mutation testing: don’t go on a killing spree

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Hexagonal Architecture is just an island

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Playwright tests and API Mocking

Charge your APIs Volume 25: Contract Testing

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Count your queries! Repository integration tests with Hibernate Statistics

Charge your APIs Volume 6: Perfecting Your APIOps - Harnessing the Power...

Charge your APIs Volume 4: Streamlining API Operations with Continuous...

Charge your APIs Volume 3: Optimizing API Testing with Contract Testing

JavaScript test performance: getting the best out of Jest

APIOps – Automated processes for even better APIs

Green test pyramids with Cypress – UI testing of the future

Detox vs. Appium – a comparison of React Native testing frameworks

Code-based remote API mocking with Typescript and Webpack

Implementing and testing an Angular feature flag directive

Physical regression testing for the Thermomix

We did our homework – what are the next steps? – Part 4