Sensible mutation testing: don’t go on a killing spree

26.2.2016 | 6 minutes reading time

This is a follow-up to my earlier post about mutation testing (MT). To recap: MT helps you ensure that your unit tests are any good. The framework manipulates your compiled code by inserting small changes (mutants). It then re-runs your tests and expects the unit tests that touched the mutated code to fail. A failing test means the mutant was killed. This implies that your test suite not only covered the affected lines in the source code, but also threw an assertion. As you all know, test coverage means nothing without proper assertions.

Let’s have look at a concrete sample of a very simple business rule. You can find the code on https://github.com/jaspersprengers/pitestsample.git

1public class Bouncer {
2  enum Gender {
3    MALE, FEMALE;
4  }
5 
6  public boolean welcome(int age, Gender gender) {
7    if (gender == Gender.FEMALE) {
8      return age >= 18;
9    } else
10      return age >= 21;
11    }
12  }
13}

Our bouncer lets in only men over 21 and women over 18. Now consider the following two unit tests:

1@Test
2public void test21YearOldMale() {
3  assertThat(filter.welcome(21, MALE)).isTrue();
4}

1@Test
2public void test19YearOldFemale() {
3  assertThat(filter.welcome(19, FEMALE)).isTrue();
4}

And let’s run the tests with coverage:

Cool! We have 100% rock solid coverage with only two tests. Let’s see what PIT has to say about this brave effort. If you run

mvn -DwithHistory clean install org.pitest:pitest-maven:mutationCoverage

on the command line from the project root you can find the test results in target/pit-reports.

No so great… PIT inserted 19 mutants, of which seven were left to survive, despite our 100% coverage. Let’s take a detailed look.

It should not come as a surprise that our test suite is sadly inefficient:

We did not allow for the gender argument to be null.
We did not check for edge cases. Taking into account the nearest integer values to 18 and 21 (the values used in the comparisons) means we should check for 17,18,19,20,21 and 22.
In combination with the three possible values for gender this means we have 18 possible scenarios to test.

So to outsmart PIT we need to test for all these edge cases:

1@Test
2 public void testAllMales() {
3 assertThat(filter.welcome(17, MALE)).isFalse();
4 assertThat(filter.welcome(18, MALE)).isFalse();
5 assertThat(filter.welcome(19, MALE)).isFalse();
6 assertThat(filter.welcome(20, MALE)).isFalse();
7 assertThat(filter.welcome(21, MALE)).isTrue();
8 assertThat(filter.welcome(22, MALE)).isTrue();
9 assertThat(filter.welcome(17, FEMALE)).isFalse();
10 assertThat(filter.welcome(18, FEMALE)).isTrue();
11 assertThat(filter.welcome(19, FEMALE)).isTrue();
12 assertThat(filter.welcome(20, FEMALE)).isTrue();
13 assertThat(filter.welcome(21, FEMALE)).isTrue();
14 assertThat(filter.welcome(22, FEMALE)).isTrue();
15 assertThat(filter.welcome(17, null)).isFalse();
16 assertThat(filter.welcome(18, null)).isFalse();
17 assertThat(filter.welcome(19, null)).isFalse();
18 assertThat(filter.welcome(20, null)).isFalse();
19 assertThat(filter.welcome(21, null)).isTrue();
20 assertThat(filter.welcome(22, null)).isTrue();
21 }

If we do that and rerun PIT all mutants are killed:

This example clearly shows how treacherous 100% coverage can be and how laborious it is to cover all edge cases and kill all mutants. We needed 18 test assertions to cover four lines of not very difficult code. Which begs the question…

Should all mutants be killed?

The short answer: no.

The long answer: it depends. It’s like insisting on 100% test coverage. Some code is more complex than other and is therefore more prone to errors. Complex algorithms and regular expressions are different from boilerplate statements like:

1try {
2  connection.close();
3} catch(SQLException e){
4  complain("checked exceptions: when did it every seem a good idea??");
5}

To get 100% line coverage here would mean adding a test scenario that throws an error on connection.close() and verify the call to complain(..). If you haven’t got it covered, PIT will remove the call to ‘complain’ and complain in turn that you didn’t kill that mutant. By all means do, if your time and budget are limitless.

Remember that a mutant is only an instance in which an artificial change to your code was not detected by your tests. It might help you detect bugs in your code, but it doesn’t always make sense to write tests for them. MT can become extremely pedantic if you enable all possible mutation operators. Here’s a few that PITest supplies, but are disabled by default:

The constructor calls mutator mutates Thing t = new Thing() to Thing t = null;
The Non-void method call mutator replaced a call to that method with null (or the default primitive value). int i = calculate() becomes int i =0;

In the previous example I have enabled all operators that PIT has to offer. You can probably see why some are disabled. To harness against these mutants would litter your test code with superfluous null-checks.

Some code is not worth unit-testing

Feel free to disagree, but I’m thinking of configuration disguising as code: the stuff that used to go in XML, like the beloved Spring Security incantations:

1protected void configure(HttpSecurity http) throws Exception {
2    http
3   .headers().disable()
4   .authorizeRequests()
5   .antMatchers("/v*/**/*").authenticated()
6   .and()
7   .httpBasic()
8   .and()
9   .userDetailsService(userService)
10   .csrf().disable();
11}

Yeah, you could mock HttpSecurity and all the other convenient ProviderBuilderConfigurerFactories inside it, but you’d do better to invest that time in a good integration test: a test that actually fires up a web container, makes calls through an http client and asserts those results. Not having any unit tests means of course that every mutation to your source code survives, since no test could kill them. Let them live.

Ignore non-business logic with predictable side-effects

Logging statements are somehow external to your business logic: they have side effects depending on various configuration parameters (log level, appenders, etc.) It makes no sense unit-testing those calls and therefore PIT by default does not mutate calls to the most popular frameworks: log4j, slf4j, java.util.logging (sorry: that was never really popular). You can further configure PIT to ignore calls to certain classes. They will nearly always be void method calls, or called in a void context.

Domain logic should not become overly defensive

It is a good architectural principle to make your outward-facing code (e.g. servlets/REST endpoints) extra defensive and picky in what it expects. Faulty user slipping through this layer is a common source of errors deeper down, and you should allow for many test cases of the “unhappy flow” variety. This is where MT is helpful, by making sure you have covered all the edge cases. Otherwise your inventive users will discover them for you.

Your domain logic should be rigorously protected by this outside outward-facing layer, which makes sure no garbage goes in. If you stick to this principle, runtime exceptions in domain logic should be quite rare. Package-level methods in your domain layer that receive only input from guaranteed (i.e. your own) sources can relax and be more trusting. You don’t need the same lock on your bathroom that you have on your front door.

MT however makes no provision for all this. It is ignorant of the fact that some code receives more garbage and hence has to do a lot more error checking and validation. PIT may insist that you null-check the input to every private method, but you know it makes no sense. Let those mutants live.

Summary

Mutation testing is a very powerful tool to urge you to run that extra mile in unit-testing where it’s important. But it’s up to you to decide which code deserves that extra attention. After all, our time is precious. You could add MT to the suite of continuous integration steps and set a minimum threshold for mutants allowed to survive, but it’s up to you to decide what a sensible threshold would be.

Was this post helpful?

Blog author

Jasper Sprengers

Do you still have questions? Just send me a message.

fromJasper Sprengers

Elegant delegates in Kotlin

Kotlin has given us some really killer features . Some are obviously useful (null safety), while others come with a warning, like operator overloading and extension functions. One such ‘handle-with-care’ feature is the language support for delegation...

Kotlin

15.11.2017 | 5 minutes reading time

Jasper Sprengers

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

In my previous post I explained how software anti-patterns are symptoms of bad habits that can be endemic to entire teams. Today I want to talk about what is perhaps the most infamous of all: the Golden Hammer. Actually, it’s a collection of hammers...

9.10.2017 | 6 minutes reading time

Jasper Sprengers

When anti-patterns become a pattern

There are plenty of learning resources on software best practices. Sprinkled in between all the well-intended advice are warnings about common pitfalls. We could do with a lot more of these warnings and think about why we keep doing the same things wrong...

27.9.2017 | 6 minutes reading time

Jasper Sprengers

The most useless knowledge of all

There are things a programmer needs to know, no excuses. There are things you can’t possibly all remember, so it’s fine to look them up when needed. There is the business domain the software touches on that you need to know. And then there’s knowing...

10.9.2017 | 5 minutes reading time

Jasper Sprengers

Not everything that is vital is also your core business

Large software projects have many vital concerns, such as authentication and authorization. Despite the wealth of available libraries in the Java ecosystem we seem to be re-inventing the wheel far too often. Keep the focus on the core business of your...

Software architecture
Java
IT-Security

16.8.2017 | 5 minutes reading time

Jasper Sprengers

In defence of pedantic tools

Outline We aim to please the customer at short notice and always overestimate our capacity to comprehend a system as it gets more complex. That’s a recipe for technical debt. The antidote to this psychological shortfall is more team discipline in writing...

Agile methods
CI/CD

2.8.2017 | 7 minutes reading time

Jasper Sprengers

Mocks or the real thing? Tips for better unit testing

Recently I had to bone up on some of the new features in Mockito 2 and Powermock , though more out of necessity than from genuine curiosity. Powermock and Mockito 2 let you fake static methods, final classes and even constructor calls, but this has ...

Agile
Agile methods

16.7.2017 | 8 minutes reading time

Jasper Sprengers

Essentialism for developers

Essentialism – the Disciplined Pursuit of Less by Greg McKeown is a book with an essential message: much of life is irrelevant distraction and we would all be happier and more productive if we learned to strive for less, but better. I encourage you ...

2.7.2017 | 8 minutes reading time

Jasper Sprengers

The vicious circle of bad test code and how to break it

SUMMARY Compared to the great advances in programming languages and tools, the day-to-day practice of how we actually code is messier than it should be. Especially for long running and complex products building things right is just as important as building...

Agile methods
Agile
Testing

6.6.2017 | 9 minutes reading time

Jasper Sprengers

CRUD operations on Spring REST resources with Kotlin

In this practical, hands-on post I would like to share some of my experience in building REST services wih JSON and Spring(Boot) using Kotlin. All examples can be transferred to Java, and if you use the indispensable Lombok library it doesn’t even look...

Kotlin
API
Spring

2.4.2017 | 7 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices part 2

This is the second part of my earlier post about strategies for integration-testing Spring Boot applications that consist of multiple (rest) services. You can find the accompanying sample application in my gitlab account: git clone git@gitlab.com:jsprengers...

27.2.2017 | 8 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices

SUMMARY: Unit tests are a necessary condition to clean code, but today’s convention-over-configuration frameworks like Spring Boot are often used to build applications consisting of multiple services. You need some way of ensuring that the parts are ...

Testing
Microservices

13.2.2017 | 9 minutes reading time

Jasper Sprengers

Web frameworks and how to survive them

SUMMARY: Frameworks that help build the web apps of tomorrow must keep up with all powerful new technology there is on offer. At some point your application has to adapt, and that is never a painless process. You can avoid a total rewrite however if ...

Angular
Java
JavaScript
Webdevelopment

12.1.2017 | 8 minutes reading time

Jasper Sprengers

Kotlin’s killer features

SUMMARY: Kotlin is a new JVM language fully interoperable with Java bytecode. It is clearly inspired by Scala, but has a different design philosophy, a much gentler learning curve and some really helpful features like null-safe types. The Importance ...

3.4.2016 | 10 minutes reading time

Jasper Sprengers

Caching de luxe with Spring and Guava

Summary We generally don’t optimize expensive operations in code until they create a bottleneck. In some of these cases you could benefit a lot from caching such data. The Spring solution is non-intrusive, highly configurable yet easy to set up, and ...

14.3.2016 | 13 minutes reading time

Jasper Sprengers

Mutation Testing: Watching the Watchmen

You can’t do without automated (unit) tests if you want to stay on top of the ever increasing complexity of software projects. A mutation testing framework ‘watches the watchmen’ by inserting small changes into your compiled byte code and then validating...

Testing

25.1.2016 | 7 minutes reading time

Jasper Sprengers

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 [Missing String "readingTime"]

Danny Steinbrecher

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the third and last one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first and second article)The previous articles focused on (i) Microcks’ ...

Testing
API

23.10.2024 | 11 [Missing String "readingTime"]

Dr. Florian Rademacher

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the second one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first article)While the previous article concentrated on Microcks’ architecture,...

API
Testing

16.10.2024 | 11 [Missing String "readingTime"]

Dr. Florian Rademacher

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Key TakeawaysAPI mocking used, e.g., for integration testing, is challenging as it assumes conformance to mocked API functionality, which can incur significant costs as mock complexity increases with API complexityDefinition-based API mocking can reduce...

API
Testing

9.10.2024 | 9 [Missing String "readingTime"]

Dr. Florian Rademacher

Playwright tests and API Mocking

Problem definition Playwright tests can sometimes depend on external services such as APIs, which might happen to be unavailable at times. In this case there are several options for executing these tests adequately, as described below. Actually call ...

Testing

10.5.2024 | 4 [Missing String "readingTime"]

Ege Inanc

Charge your APIs Volume 25: Contract Testing

I feel the way we do integration testing is sort of like setting your house on fire to test your smoke alarm. It is excessive, tiresome and way too costly. This is not a quote from myself. I typically don't come up with such good ideas when I need....

Testing
Software development
API

2.4.2024 | 11 [Missing String "readingTime"]

Pasquale Brunelli

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 [Missing String "readingTime"]

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 [Missing String "readingTime"]

Francesca Diana

Count your queries! Repository integration tests with Hibernate Statistics

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked....

Java
Testing
Spring
Database

7.8.2023 | 6 [Missing String "readingTime"]

Kevin Peters

Charge your APIs Volume 6: Perfecting Your APIOps - Harnessing the Power...

Our journey through the expansive landscape of API Operations (APIOps) has led us through various territories. We've delved into Continuous Integration and Deployment, ensuring seamless transitions from coding to production-ready APIs with minimal friction...

API
Testing
GitHub

14.6.2023 | 2 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 4: Streamlining API Operations with Continuous...

API operations refer to the maintenance and management of APIs (Application Programming Interfaces) throughout their lifecycle. This includes everything from design and development to testing, deployment, and ongoing maintenance. Continuous Integration...

Testing
API

31.5.2023 | 6 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 3: Optimizing API Testing with Contract Testing

API testing is a crucial part of the development process that ensures the functionality, reliability, and performance of the API. Testing helps to identify and resolve errors early on, which translates into reduced development costs and improved customer...

API
Testing

24.5.2023 | 6 [Missing String "readingTime"]

Daniel Kocot

JavaScript test performance: getting the best out of Jest

In recent years Jest has established itself as the go-to testing framework for JavaScript and TypeScript development. It provides a complete toolkit (test runner, assertion library, mocking library, code coverage and more) out of the box, and requires...

Node.js
JavaScript
APM
Testing

12.11.2021 | 7 [Missing String "readingTime"]

APIOps – Automated processes for even better APIs

In my German Softwerker article (Vol. 14, p. 90) , I already dealt with the continuous design and development cycle of APIs. This was mainly about basic assumptions and tooling, including the introduction of API gateways or platforms into existing development...

DevOps
Cloud
Testing
API

28.1.2021 | 8 [Missing String "readingTime"]

Daniel Kocot

Green test pyramids with Cypress – UI testing of the future

Cypress is a young open-source testing framework for web-based user interfaces (UI). Cypress tests are written in JavaScript and, as is also common with Selenium-based technologies, are based on the Document Object Model (DOM) of the HTML of a web application...

Frontend
JavaScript
Testing

29.9.2020 | 8 [Missing String "readingTime"]

Detox vs. Appium – a comparison of React Native testing frameworks

Currently, there are especially two end-to-end testing frameworks which are interesting for React Native developers: Detox and Appium. During my internship at codecentric, I analyzed and compared both frameworks in detail, writing tests with both frameworks...

React
Testing

16.7.2020 | 5 [Missing String "readingTime"]

Anja Bender

Code-based remote API mocking with Typescript and Webpack

IntroductionRemember how you have to set up a whole bunch of infrastructure locally just to be able to independently work on the frontend part of your Jamstack project? If you’re tired of doing so, maybe this short write-up on how to replace the infrastructure...

Frontend
Testing
JavaScript

13.7.2020 | 5 [Missing String "readingTime"]

Implementing and testing an Angular feature flag directive

IntroductionAn important goal of agile software development is to shorten the user feedback loop. To achieve that you want to release your changes as often as possible. This also includes releasing prototypes, e.g. to a smaller audience, gathering customer...

Frontend
Angular
JavaScript
Testing
Webdevelopment

18.5.2020 | 6 [Missing String "readingTime"]

Physical regression testing for the Thermomix

Automating physical regression testing of products with computer vision and roboticsTesting a physical product can be a highly manual task. The advances in Deep Learning techniques and computer vision have led to a situation where we can start to strive...

AWS
IoT
Computer Vision
Product management
AI
Testing

31.3.2020 | 8 [Missing String "readingTime"]

We did our homework – what are the next steps? – Part 4

First: the most important step for a company is to identify the user’s pain points or particular frustration, rather than focussing on the amount of features you think are good for the user to have. Take a moment and rethink those decisions based on ...

Startup
Agile
Agile transformation
Product management
Agile methods
Testing
UX/UI

16.3.2020 | 6 [Missing String "readingTime"]

Sensible mutation testing: don’t go on a killing spree

Should all mutants be killed?

Some code is not worth unit-testing

Ignore non-business logic with predictable side-effects

Domain logic should not become overly defensive

Summary

Was this post helpful?

Blog author

More articles

Elegant delegates in Kotlin

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

When anti-patterns become a pattern

The most useless knowledge of all

Not everything that is vital is also your core business

In defence of pedantic tools

Mocks or the real thing? Tips for better unit testing

Essentialism for developers

The vicious circle of bad test code and how to break it

CRUD operations on Spring REST resources with Kotlin

Integration testing strategies for Spring Boot microservices part 2

Integration testing strategies for Spring Boot microservices

Web frameworks and how to survive them

Kotlin’s killer features

Caching de luxe with Spring and Guava

Mutation Testing: Watching the Watchmen

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Hexagonal Architecture is just an island

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Playwright tests and API Mocking

Charge your APIs Volume 25: Contract Testing

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Count your queries! Repository integration tests with Hibernate Statistics

Charge your APIs Volume 6: Perfecting Your APIOps - Harnessing the Power...

Charge your APIs Volume 4: Streamlining API Operations with Continuous...

Charge your APIs Volume 3: Optimizing API Testing with Contract Testing

JavaScript test performance: getting the best out of Jest

APIOps – Automated processes for even better APIs

Green test pyramids with Cypress – UI testing of the future

Detox vs. Appium – a comparison of React Native testing frameworks

Code-based remote API mocking with Typescript and Webpack

Implementing and testing an Angular feature flag directive

Physical regression testing for the Thermomix

We did our homework – what are the next steps? – Part 4