Performance measurement with JMH – Java Microbenchmark Harness

22.10.2017 | 7 minutes reading time

What is benchmarking and why should we do that?
If there are multiple ways to implement a feature or if we have serious doubts about performance while using a certain technology, special implementation patterns or a new “cutting edge” library, we have to make decisions. There might be assumptions about performance effects of a certain way of implementing something, but in the end – if we do not measure and compare the different approaches – we will not be sure if our decision was correct. This is true for the big architectural topics, but also for smaller-scoped concerns such as preferring a certain API method although multiple alternatives exist. So we should stop guessing and start measuring performance! In other words, we should use benchmarks! This article introduces Java Microbenchmark Harness (JMH), an OpenJDK project which aims to ease setting up a benchmark environment for Java performance tests.

Benchmark == Benchmark?
To categorize benchmarks in a more fine-grained manner, people invented benchmark categories such as “micro”, “macro”, or even “meso”, which separate performance tests by scope. Maybe we can say the separation is done by the affected layers or complexity level of code under test.

Microbenchmarks are performance metrics on the lowest level. You can compare them to unit tests, which means they invoke single methods or execute small pieces of business logic without “more (cross-cutting) stuff” around.

Macrobenchmarks are the opposite of that. They test entire applications similar to end-to-end tests.

Mesobenchmarks represent something in between, which means they are written to measure entire actions (features, workflows) related to bigger parts of our applications using different layers in interaction with each other without spinning up the entire application. This could be a single feature which uses authentication/authorization, reads data from a database or calls external services and so on. We could range mesobenchmarks with integration tests.

In this post I will put the focus on the smallest kind of these. So let’s concentrate on the microbenchmarks.

How to implement microbenchmarks
If we want to know which methods are performing better than others, we should give it a try and compare them under equal conditions. A naive approach would be to call the different methods within some kind of common simple unit test and take a look at the time the execution takes, maybe implemented using java.lang.System.currentTimeMillis(). Then we could just compute the difference between start and stop timestamps and on the first view that’s sufficient to get an idea about its performance – but taking a second look, it’s not. We should take into account how the JVM executes and especially optimizes code. Regarding this point of view, our results would not be reliable using values we got after a single execution. There are many JVM-related optimization topics we have to keep in mind and I will give some further hints afterwards. For now it’s important that the more often a line of code is executed, the more information the JVM will get about that code and it will optimize it (if possible). So if we want to measure code which will be invoked frequently in later production environments (and that code is the crucial part of our software we should measure), we should also measure it after some warmup iterations to simulate “real” production conditions. And now it’s getting interesting (some people would rather say “complicated”).

The question now is: How should the warmup be implemented? Use a boolean flag which separates warmup iterations from measurement iterations and switch that flag after some time? Maybe, but doing that again and again would be a cumbersome, error-prone task.

Using JHM for benchmarks
Blessedly, there is the Java Microbenchmark Harness . This is an OpenJDK project which aims to ease setting up a benchmark environment for Java performance tests. If you are already familiar with JUnit tests (and you should be) it will be very comfortable to get started with JMH.

Set up the JMH environment
To create a maven benchmark project, just use the maven archetype and provide your preferred groupId, artifactId and version.

mvn archetype:generate \ -DinteractiveMode=false \ -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DgroupId=com.example \ -DartifactId=jmh-number-verification-performance-test \ -Dversion=1.0

That command will create a skeleton project which can execute your benchmarks. After you wrote your tests (as described below), build it with mvn clean install. The project creates a benchmarks.jar in the target folder which should be used to run the measurements:

java -jar target/benchmarks.jar

Although you could use your IDE to run the tests, you should prefer this standalone JAR. It provides great portability – you can execute it on different machines – and there is no performance penalty due to any IDE overhead.

Writing benchmarks
Writing benchmarks is as simple as writing JUnit tests. The main difference is that you have to annotate a test method with @Benchmark instead of @Test. Just use the archetype-generated class (MyBenchmark), rename it or write your own class and invoke the suspicious code you want to measure within a @Benchmark method. JHM is doing everything else and generates a performance report for you.

As with JUnit, it is also possible to use parameterized tests. This is the purpose of the @Param annotation. There are a lot of examples for a bunch of use cases available on the project site.

Long story short, to try it out, I created a benchmark which compares different approaches to check if a String represents a valid Integer value. It compares following implementations:

– using try-catch with Integer.parseInt(String)
– StringUtils.isNumeric(String)
– String.matches("\\d+")

Check out the example project on Github . With that benchmark, we could find out which approach produces the best performance result.

Performance results
Per default, JMH executes 10 Forks (separate execution environments), 20 warmup cycles (without measurement, providing the opportunity to the JVM to optimize the code before the measurement starts) and 20 real measurement iterations for every test. Of course, this behavior can be overidden on annotation basis (@Fork, @Warmup, @Measurement).

The results will vary depending on the configuration of the target machine they are running on. After the runs are finished, a report will be generated. The following output is a result report generated by the mentioned comparison benchmark, running on my i7 laptop (7700 HQ, 4 x 2.8 GHz, 32 MB RAM):

Since you can execute your benchmarks in different @BenchmarkModes you have to read the results differently. E.g. in Mode.AverageTime a lower score is preferred, while using Mode.Throughput a higher value points to better performance.

Beware of the JVM optimizations
As mentioned before, the JVM will optimize code based on collected information during execution. Usually this is a good thing we should appreciate, at least for production environments. But under artificial conditions (our microbenchmark definitely is one) this could cause problems. Here are some topics you should be aware of:

Warmup:
The first big obstacle is conquered using JHM itself – it delivers the warmup cycles out of the box. So the JVM can collect some information about the code under test and the effectively executed code will be more “production-like” than a once executed method ever could be.

Always read computed results:
If you don’t use code (e.g. if you never read a private variable), the JVM is free to discard that code during compilation. This is called “Dead Code Elimination”, which means that even the entire computation of these dead results will probably be eliminated if no one is interested in. This will definitely distort your benchmark results and can lead to false conclusions. So take an interest (or at least pretend) and read your computation results even if they are not relevant for your test. This could be done either by returning result variables or throwing it into a so-called Blackhole injected by declaring it as input parameter for your benchmark method.

Differences to production code:
We’re done with this short introduction to JMH and although we talked about reliable results, please be aware of the fact that code in tests will always behave differently to the same code executed during production. There are a lot of optimizations the JVM will do afterwards, e.g. depending how often methods are invoked (hot code), call hierarchies and stack depth. So performance tests are at most a good hint, but no guarantee. The best thing you can do is measure performance in production using metrics or with profiling.

Was this post helpful?

Blog author

Kevin Peters

Senior IT Software Engineer / Consultant

Do you still have questions? Just send me a message.

Nested Fixture Pattern for JUnit

JUnit's @Nested classes are usually presented as a way to group related tests. But combined with @RegisterExtension and ExtensionContext.Store, they become something more powerful: a declarative scenario tree where each level adds a scope in which fixtures...

Testing
Java
Software development

9.3.2026 | 11 minutes reading time

Rüdiger zu Dohna

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 minutes reading time

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 minutes reading time

Roger Butenuth

Nils Winking

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 minutes reading time

Danny Keller

How to validate your Spring Boot implementation when choosing an API first...

When choosing to follow the API First approach, ensuring that the actual implementation follows the defined specification can present a significant challenge. Achieving alignment between the specification and implementation is crucial, as it greatly...

Spring
API
Java
Validation

7.6.2024 | 6 minutes reading time

Hendrik Kamp

Count your queries! Repository integration tests with Hibernate Statistics

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked....

Java
Testing
Spring
Database

7.8.2023 | 6 minutes reading time

Kevin Peters

Compile once, run anywhere with WebAssembly and WASI

WebAssembly was initially created to bring languages other than JavaScript to the browser. Its design goals include portability, safety and performance. WASI (WebAssembly System Interface) lifts those capabilities to the world outside the browser. This...

Go
Java

3.2.2023 | 10 minutes reading time

Microstream – the end of O/R mappers?

Searching for alternatives to O/R mappers and persistence frameworks for NoSQL databases, I came across Microstream and was interested pretty quickly. On the one hand because Microstream is being developed in my home region Oberpfalz, but mainly because...

Java
Database
Software architecture

29.9.2022 | 14 minutes reading time

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Heroku is cancelling their free plan! What about all my open-source projects? Luckily fly.io comes to the rescue! Here are the missing docs on how to run Spring Boot on fly.io.Why I love(d) HerokuHeroku was my go-to PaaS for open-source projects for ...

CI/CD
Java
Cloud
DevOps
Spring

18.9.2022 | 17 minutes reading time

Keycloak.X, but secure – without vulnerable libraries

TLDR: How to reduce the known CVEs (common vulnerabilities and exposures) to zero by creating your own Keycloak distribution* .IntroductionKeycloak (see website) will become easier and more robust by switching to Quarkus, at least that’s the promise...

Java
IT-Security
Keycloak

9.5.2022 | 11 minutes reading time

Migrating a Spring Boot application to Java 17 – the hard way: Day 2

Welcome back to my article on migrating a Spring Boot application to Java 17 – the hard way.On day 1 we:tried using Java 17 with our Spring Boot 2.3.3.RELEASE, didn’t workupgraded Lombok and MapStructcouldn’t upgrade ASM, since Spring repackages ASMupgraded...

Java
Spring

22.12.2021 | 18 minutes reading time

Migrating a Spring Boot application to Java 17 – the hard way

Java 17 has recently been released, and I’m excited for the many improvements and new features. Instead of starting from a new or recent project (where’s the excitement in that?), we’re going to update an existing Spring Boot application until we can...

Java
Spring

14.12.2021 | 11 minutes reading time

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 minutes reading time

Hendrik Schawe

JavaScript test performance: getting the best out of Jest

In recent years Jest has established itself as the go-to testing framework for JavaScript and TypeScript development. It provides a complete toolkit (test runner, assertion library, mocking library, code coverage and more) out of the box, and requires...

Node.js
JavaScript
APM
Testing

12.11.2021 | 7 minutes reading time

Getting efficient with code and IDEs

Have you ever wondered why people use their Integrated Development Environment (IDE) in other ways than you do? Noticed people being dramatically slower or faster than you while programming? Remember that person using their mouse for every little action...

Software development
Java

6.10.2021 | 11 minutes reading time

Axon Framework 102: Dealing with personal data

Welcome to Axon Framework 102, where we will be deep diving into many interesting challenges you will encounter when working with Axon Framework. We will be diving into asynchronous projections and letting the front-end know new data. We will take a ...

DDD
Framework
Pattern
Event Sourcing
Compliance
Java
Kotlin

10.5.2021 | 5 minutes reading time

The how of monitoring your services

Lately, there has been a lot of discussion about SLAs, SLOs and SLIs. As this article states, it is hard to define the correct SLOs and SLIs. This discussion is about what part of your services you want to monitor. But it is also difficult to measure...

Infrastructure
APM

17.11.2020 | 5 minutes reading time

Rust for Java developers

Rust for Java developers – A step-by-step introductionThe Java ecosystem is vast and can solve almost any problem you throw at it. Yet its age shows in several parts, making it clunky and unattractive to some Java devs – devs that may be interested in...

Software development
Java
Rust

9.9.2020 | 36 minutes reading time

Elisabeth Schulz

Creating integration flows with the Reedelk Data Integration Platform

The integration of data from systems of record or legacy systems is one of the elements of a software development project that does not start on a greenfield. In other words, it can help modernize software. Usually the question arises how to transfer...

Agile transformation
Container
Software architecture
Java
Microservices
Open Source
API

3.9.2020 | 8 minutes reading time

Daniel Kocot

Performance optimization of a GraphQL app with Instana

“Works on my machine.” Okay, but we know quite well software never behaves the same when running on different machines… We knew that, but ran into unexpected performance issues when going live with a simple app. Here’s how we fixed the problem and improved...

Cloud
APM
API
JavaScript

21.7.2020 | 8 minutes reading time