Microbenchmarking your Scala code

29.11.2019 | 11 minutes reading time

Motivation

I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when writing software I usually favour readability, maintainability, and extensibility over speed. I agree with Donald Knuth that premature optimization usually causes more problems than it solves.

Nevertheless at some point you are going to write code where performance matters, or at least bad performance hurts. In this situation it might be useful to look at the performance characteristics of your code. I personally like to combine two approaches:

Complexity analysis
Runtime benchmarks

In this blog post I want to focus on runtime benchmarks only, specifically microbenchmarking. The next section is going to set a few theoretical foundations. Afterwards we are going to look at ScalaMeter , a tool for automated performance testing in Scala. The last section contains a few examples, comparing the runtime performance of different implementations. We finish the blog post by summarizing the main points. The examples will be written in Scala and related to the Scala programming language but should be understandable for anyone with a bit of functional programming knowledge.

If you want to learn more about how to analyze the complexity of your algorithms, I can recommend the amazing book “Introduction to Algorithms” [1].

Microbenchmarking

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. [2]

In bechmarking there are different granularity levels, similar to system tests, integration tests, and unit tests in functional testing. Microbenchmarking typically refers to isolated benchmarks of individual methods, e.g. API calls.

Similar to unit tests, having automated microbenchmarks cannot give you any guarantees. The results heavily depend on the selected input and interaction effects between different components of your architecture are not taken into consideration. Nevertheless they are a useful tool to compare the relative performance of different implementations. They can also be used for regression testing.

ScalaMeter

“ScalaMeter is a microbenchmarking and performance regression testing framework for the JVM platform that allows expressing performance tests in a way which is both simple and concise.” [3]

A simple benchmark looks like this: [4]

1import org.scalameter.api._
2
3object RangeBenchmark
4extends Bench.LocalTime {
5  val sizes = Gen.range("size")(300000, 1500000, 300000)
6
7  val ranges = for {
8    size <- sizes
9  } yield 0 until size
10
11  performance of "Range" in {
12    measure method "map" in {
13      using(ranges) in {
14        r => r.map(_ + 1)
15      }
16    }
17  }
18}

It generates integer ranges from 0 to 300.000, 600.000, 900.000, 1.200.000, and 1.500.000 respectively. It then measures the run time of the map operation on these ranges and generates the following output:

1Parameters(size -> 300000):  1.653809 ms
2Parameters(size -> 600000):  3.282649 ms
3Parameters(size -> 900000):  4.939347 ms
4Parameters(size -> 1200000): 6.492767 ms
5Parameters(size -> 1500000): 8.148826 ms

ScalaMeter provides a highly configurable testing framework with default configuration for different standard use cases from quick console reporting all the way to sophisticated regression testing with HTML reporting. I find the following features especially useful:

Concise and readable DSL for data generation and test specification
Configurable execution (e.g. separate JVM, warm-up runs, measured runs)
Configurable measurements and aggregations (ignoring GC, outlier elimination, mean, median, …)
Configurable reporting (text, HTML, logging, charts, DSV, …)
Configurable persistence (Java or JSON serialization)

In the next section we are going to look at some experiments where I used ScalaMeter to perform the measurements.

Example experiments

In this section we are going to look at three experiments:

How do chained map operations perform compared to a single combined map operation?
How do different collections perform when being sorted? How does the Scala sort implementation perform compared to the native Java one?
When building up a collection, how does the performance differ when using a builder vs. concatenating?

All experiments are performed using ScalaMeter 0.9 and Scala 2.12.4. My computer has a 2016 3,3 GHz Intel Core i7 with 16 GB of RAM. I am using the Bench.OfflineReport, which executes the code in a separate JVM and applies an appropriate number of warm-up runs.

Chained Map Operations

Motivation

When working with collections in Scala, the map operation is quite common. xs.map(f) applies the function f to every element x in xs and returns the result. If you have two composable functions f and g and you want to apply both, you express that either as

xs.map(g compose f), or
xs.map(f).map(g)

In terms of the result, both operations are equivalent. The memory footprint and runtime however might differ, depending on the implementation of xs and map. If you are using a strictly evaluated collection, on every map call the result will be computed. If the collection is immutable, a new collection will be created with the resulting values.

In this experiment we want to look at the relative runtime performance of both expressions comparing a List (strict) and a SeqView (lazy).

Variables

1val strictList = List.iterate(0, 1000000)(_ + 1)
2val lazyList = strict.view
3val f: Int => Int = _ + 1
4val fs = List.fill(10)(f)
5val fsAndThen = fs.reduce(_ andThen _)

Experiments

Given both the strictList and the lazyList as l, we perform the following two experiments for both of them. Note that we omit the force command here for the sake of simplicity. In the experiment it is needed to actually trigger the computation of the view.

l.map(fsAndThen), which applies f 10 times in a single map operation
fs.foldLeft(l)((l, f) => l.map(f)), which applies f one time in each of the 10 map operations

Results

Looking at these results I find three notable observations:

On the strict list, the chained map operations took more than three times longer on average than using the single map operation.
This effect is not present when using the list view.
The performance of the chained map on the list view is comparable to the strict list single map results.

Given these results, we can draw the following conclusions. Using chained map operations on strictly evaluated, immutable collections can have a significant performance impact. If performance matters, you should aim to combine your map operations. If you cannot combine the map operations yourself (maybe you are just providing a library, like Apache Spark ), using a lazily evaluated collection can help reducing the run time significantly.

Sorting Data Structures

Motivation

Sorting a collection is required in many applications. May it be showing a list of events ordered by their time of occurrence, or preparing a table for being joined with another one using a merge-join algorithm. 20 years ago, developers had to be able to write efficient sorting algorithms themselves, as standard libraries were not as rich and computers not as fast.

Nowadays you will find fast-enough implementations of sorting algorithms in almost any standard library. If you are not dealing with strict performance requirements, this is also fine, as using available standard functions can make the code less buggy and more readable.

Scala offers a method to sort immutable collections called sorted, which is available for all standard sequence-like collections. In this experiment we want to compare the relative performance of sorted on different Scala data structures, and also compare it to the performance of the java.util.Arrays.sort method.

Variables

1val size = Gen.enumeration("size")(List.iterate(1, 7)(_ * 10): _*)
2val list = for { s <- size } yield List.fill(s)(Random.nextInt)
3val array = for { l <- list } yield l.toArray
4val vector = for { l <- list } yield l.toVector

Experiments

Given the list, array, and vector as l, filled with random integers, we sort them using the Scala sorted method. For the array, we also apply the Java sort, which works in-place. In order to make the results comparable to Scala, which gives you a new collection back instead of modifying the existing one, we also copy the array first in another experiment.

l.sorted
util.Arrays.sort(l)
val newArray = new Array[Int](l.length) Array.copy(l, 0, newArray, 0, l.length) util.Arrays.sort(newArray)

Results

Looking at the performance for different collection sizes there are no surprises (log scale would’ve been better I know…). When looking at the difference between the Scala and Java array sort, I was kind of surprised.

The Scala sorts are so much slower. Looking at implementation of sorted we can spot two details which might explain the difference in runtime.

1def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
2   val len = this.length
3   val b = newBuilder
4   if (len == 1) b ++= this
5   else if (len > 1) {
6     b.sizeHint(len)
7     val arr = new Array[AnyRef](len)
8     var i = 0
9     for (x <- this) {
10       arr(i) = x.asInstanceOf[AnyRef]
11       i += 1
12     }
13     java.util.Arrays.sort(arr, ord.asInstanceOf[Ordering[Object]])
14     i = 0
15     while (i < arr.length) {
16       b += arr(i).asInstanceOf[A]
17       i += 1
18     }
19   }
20   b.result()
21 }

You can see that it also uses Arrays.sort internally. But why is it so much slower? I think the reason for that is that it does not only copy the data to a new array but also has to copy it back to a collection of the original type. When sorting an immutable list you expect to get another immutable list back. This is done using a respective builder b (e.g. a ListBuilder):

1while (i < arr.length) {
2  b += arr(i).asInstanceOf[A]
3  i += 1
4}

But also the creation of the initial array could be a reason for it being slower. For making the copy, I was using Arrays.copy, which uses the native method java.lang.System.arraycopy under the hood. In the Scala method, the array is created in a loop:

1for (x <- this) {
2  arr(i) = x.asInstanceOf[AnyRef]
3  i += 1
4}

Looking at those results, I think it is obvious that when it comes to sorting, you should always check the performance of your implementation if it matters to you. Using a sort algorithm that immediately outputs a new immutable collection instead of relying on an intermediate array would be the better choice. Nevertheless most of the time the benefits you gain from using standard methods outweigh the performance gain of custom solutions.

Concatenation vs. Builder

Motivation

One of the most common beginner mistakes when working with Java is building up strings using concatenation. In Java, strings are immutable. As they are so commonly used, the JVM maintains a string pool (see flyweight pattern ).

This has an implication when concatenating strings. The following code will create three string objects:

1var s1 = "Hello "
2s1 = s1 + "World"

While this is not a problem in most cases, it can become one when used inside loops. When building up immutable data structures you should thus always rely on the respective builders, which are mutable, and then create the final, immutable structure when you are done. This is fine as long as the builder cannot be accessed outside the local scope.

In this experiment we are going to compare the performance of building up immutable data structures step by step using concatenation on the immutable object vs. using a builder to construct the final structure.

Variables

1val size = Gen.single("size")(10000)
2val numbers = for { s <- size } yield 0 to s
3val strings = for { range <- numbers } yield range.map(_.toString)

Experiments

Given the range of numbers and strings as xs, we compare the performance of constructing a new string, set, and list out of them step by step. First we use plain concatenation (+, and ::), then we use the appropriate builder.

String

xs.foldLeft("")((l, x) => l + x)
var b = StringBuilder.newBuilder xs.foreach(x => b + x) b.result

Set

xs.foldLeft(Set.empty[Int])((l, x) => l + x)
var b = Set.newBuilder[Int] xs.foreach(x => b += x) b.result

List

xs.foldLeft(List.empty[Int])((l, x) => x :: l)
var b = List.newBuilder[Int] xs.foreach(x => b += x) b.result

Results

All three data structures are immutable. But what is the performance implication of using concatenation? Let’s look at strings first.

The graph shows that the overhead of creating additional string objects for each iteration is immense. How does it behave for sets?

For sets it is actually even worse. This is probably because it has to initialize the whole set again for every iteration. Building up a hash set is no trivial operation which makes the performance impact even higher than for strings. Does the same hold for lists?

With list the picture is different. The reason is that it requires only constant time to append an item to the front of the list. If we concatenated to the end, the run time might be longer, depending on whether the implementation stores a pointer to the end of the list as well.

If you don’t know what collection implementation you are using, you can safely rely on the newBuilder function just to make sure.

Summary

In this blog post we looked at ScalaMeter, a tool for automated microbenchmarking and regression testing in Scala. We’ve seen three examples of how small details in the code can significantly change your runtime performance.

While it is better to stick to standard methods and libraries for the sake of maintainability und readability, it might require custom solutions if performance matters to you. Anyway, by measuring the performance in an automated way you can gain more confidence about your code and also catch performance regressions automatically.

Did you write any performance tests in your projects? Did you know that the JVM needs to warm up before you can reliably measure runtime performance? Was there any performance regression introduced into one of your projects that you would’ve caught with some automated performance tests? What is your favourite benchmarking tool for the JVM? I’m curious to know your thoughts!

References

[1] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. 2001. Introduction to Algorithms (2nd ed.). McGraw-Hill Higher Education.
[2] Fleming, Philip J.; Wallace, John J. 1986. How not to lie with statistics: the correct way to summarize benchmark results. Communications of the ACM. 29 (3): 218–221.
[3] ScalaMeter Homepage
[4] ScalaMeter Getting Started Guide

Cover image taken by Jérôme S

Was this post helpful?

Blog author

Frank Rosner

Do you still have questions? Just send me a message.

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

A few weeks ago, when we introduced Dapr, we also discussed its overlapping capabilities with a service mesh, although Dapr itself is not a service mesh. As already mentioned in a previous blogpost, in recent years service meshes have become a pivotal...

Networking
Microservices
Kubernetes
Cloud native

1.8.2024 | 16 minutes reading time

Manuel Zapf

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 minutes reading time

Manuel Zapf

Macro annotations in Scala 3

In a previous blog post we took a look at macro annotations in Scala 2, where they have been present for a while. Only recently they have been added to Scala 3 as well, specifically in the pre-release version 3.3.0-RC2 of the Dotty compiler. Same as...

Scala

4.4.2023 | 9 minutes reading time

Lukas Lehmann

Macro annotations in Scala 2

In this blog post we will take a look at macro annotations, a powerful tool for code transformation and generation in Scala. Macro annotations allow us to transform the code of a definition, e.g., a class or method, at compile time. This can be used ...

Scala

28.3.2023 | 12 minutes reading time

Lukas Lehmann

Hotwire: A new (old) approach for modern web applications

Hotwire (HTML over the wire) was introduced by Basecamp in late 2020 and promises to be an alternative approach to developing modern web applications with less JavaScript:Hotwire is an alternative approach to building modern web applications without...

Frontend
Software architecture
Microservices
JavaScript
Webdevelopment

30.8.2022 | 10 minutes reading time

A microservice with Kotlin and Ktor - without Spring

Ktor (see https://ktor.io/) is a Kotlin framework that provides both client and server functions and primarily uses the Kotlin DSL instead of annotations. In this article, I would like to give an introduction to Ktor's server capabilities with a short...

Kotlin
Microservices

14.6.2022 | 4 minutes reading time

JavaScript test performance: getting the best out of Jest

In recent years Jest has established itself as the go-to testing framework for JavaScript and TypeScript development. It provides a complete toolkit (test runner, assertion library, mocking library, code coverage and more) out of the box, and requires...

Node.js
JavaScript
APM
Testing

12.11.2021 | 7 minutes reading time

How to use OAuth2 Proxy for central authentication

This blog post will show you how to use one central OAuth2 Proxy (see the official page ) as authentication proxy for multiple services inside your Kubernetes Cluster .The default example on how to secure a service with Nginx and OAuth2 Proxy shows ...

Infrastructure
Microservices
Cloud
Kubernetes
IT-Security

7.6.2021 | 2 minutes reading time

The how of monitoring your services

Lately, there has been a lot of discussion about SLAs, SLOs and SLIs. As this article states, it is hard to define the correct SLOs and SLIs. This discussion is about what part of your services you want to monitor. But it is also difficult to measure...

Infrastructure
APM

17.11.2020 | 5 minutes reading time

Creating integration flows with the Reedelk Data Integration Platform

The integration of data from systems of record or legacy systems is one of the elements of a software development project that does not start on a greenfield. In other words, it can help modernize software. Usually the question arises how to transfer...

Agile transformation
Container
Software architecture
Java
Microservices
Open Source
API

3.9.2020 | 8 minutes reading time

Daniel Kocot

Hexagon, Schmexagon? – Part 2

Exploring Variations of Implementing Domain Driven Design With The “Ports and Adapters” Pattern, Part 2Hexagonal Architecture is a key design pattern to use when implementing Domain Driven Design. It enables evolutionary changes, helps to keep test suites...

DDD
Software architecture
Microservices
Software development

30.7.2020 | 8 minutes reading time

Hexagon, Schmexagon? – Part 1

Exploring Variations of Implementing Domain Driven Design With The “Ports and Adapters” Pattern, Part 1 Hexagonal Architecture is a key design pattern to use when implementing Domain Driven Design. It enables evolutionary changes, helps to keep test ...

DDD
Software architecture
Microservices
Software development

28.7.2020 | 17 minutes reading time

Performance optimization of a GraphQL app with Instana

“Works on my machine.” Okay, but we know quite well software never behaves the same when running on different machines… We knew that, but ran into unexpected performance issues when going live with a simple app. Here’s how we fixed the problem and improved...

Cloud
APM
API
JavaScript

21.7.2020 | 8 minutes reading time

Running Spring Boot GraalVM Native Images with Docker & Heroku

Combining Spring Boot with the benefits of GraalVM Native Images is really cool. But how about doing all that magic inside a Docker container also? How about running those native apps on cloud infrastructures like Heroku?Spring Boot & GraalVM – blog ...

CI/CD
Microservices
Container
Java
Cloud
Spring

1.6.2020 | 20 minutes reading time

Kick-start your microservice project with JHipster

I recently looked for a solution on how to prototype a customer project in a short time and came across JHipster. The target architecture used Spring Boot in the backend and an Angular frontend. JHipster can scaffold this in its simplest variant as...

Node.js
Angular
Software development
Container
NoSQL
Cloud
JavaScript
Java
Keycloak
Kubernetes
Microservices
IT-Security
Open Source
React
Spring

12.5.2020 | 13 minutes reading time

Jörg Riegel

Running Spring Boot apps as GraalVM Native Images

All those Micronaut, Quarkus.io & Co. frameworks sound great! But Spring is the undisputed forerunner in Enterprise Java. Wouldn’t it be great to combine Spring Boot with the benefits of GraalVM?! Spring Boot & GraalVM – blog seriesPart 1: Running Spring...

Kubernetes
Microservices
Java
Spring

6.5.2020 | 21 minutes reading time

How to secure a GraphQL service using persisted queries

GraphQL is a rising query language that gives clients the power to ask for what they need and get exactly that in a single request. In theory this leads to effective and flexible client-server communication. But adopting new technology always comes ...

API
JavaScript
APM
IT-Security

30.4.2020 | 10 minutes reading time

Golang, Gin & MongoDB – Building microservices easily

Golang, a.k.a. Go, has been around in the industry for quite some time now, but people are still reluctant to just go ahead and use it. To help you get started, follow me on this journey and create your first microservice using Golang, Gin and Docker...

Cloud
Container
Go
Microservices
NoSQL

21.4.2020 | 10 minutes reading time

From PDF data sheets to shared understanding with serverless SHACL

Knowledge contained in PDF filesWhen crawling the web for information about products of a specific category, may it be instances of industrial machine parts, chemical components, or even household goods, manufacturers of such goods often provide the ...

NoSQL
AWS
Big Data
Data
API
Microservices
Python
Serverless
Webdevelopment

1.4.2020 | 12 minutes reading time

Performance Analysis of a GraphQL application with Instana

Modern IT landscapes typically consist of a bunch of different microservices. Replacing the monoliths brings us more complexity due to more parts and all their dependencies.A key aspect for running these systems is the appropriate monitoring with the...

DevOps
Infrastructure
API
Microservices
APM

6.3.2020 | 9 minutes reading time

Microbenchmarking your Scala code

Motivation

Was this post helpful?

Blog author

More articles in this subject area

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

Macro annotations in Scala 3

Macro annotations in Scala 2

Hotwire: A new (old) approach for modern web applications

A microservice with Kotlin and Ktor - without Spring

JavaScript test performance: getting the best out of Jest

How to use OAuth2 Proxy for central authentication

The how of monitoring your services

Creating integration flows with the Reedelk Data Integration Platform

Hexagon, Schmexagon? – Part 2

Hexagon, Schmexagon? – Part 1

Performance optimization of a GraphQL app with Instana

Running Spring Boot GraalVM Native Images with Docker & Heroku

Kick-start your microservice project with JHipster

Running Spring Boot apps as GraalVM Native Images

How to secure a GraphQL service using persisted queries

Golang, Gin & MongoDB – Building microservices easily

From PDF data sheets to shared understanding with serverless SHACL

Performance Analysis of a GraphQL application with Instana