Beliebte Suchanfragen

Cloud Native



Agile Methoden



Microbenchmarking your Scala code

29.11.2019 | 8 minutes of reading time


I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when writing software I usually favour readability, maintainability, and extensibility over speed. I agree with Donald Knuth that premature optimization usually causes more problems than it solves.

Nevertheless at some point you are going to write code where performance matters, or at least bad performance hurts. In this situation it might be useful to look at the performance characteristics of your code. I personally like to combine two approaches:

  • Complexity analysis
  • Runtime benchmarks

In this blog post I want to focus on runtime benchmarks only, specifically microbenchmarking. The next section is going to set a few theoretical foundations. Afterwards we are going to look at ScalaMeter , a tool for automated performance testing in Scala. The last section contains a few examples, comparing the runtime performance of different implementations. We finish the blog post by summarizing the main points. The examples will be written in Scala and related to the Scala programming language but should be understandable for anyone with a bit of functional programming knowledge.

If you want to learn more about how to analyze the complexity of your algorithms, I can recommend the amazing book “Introduction to Algorithms” [1].


In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. [2]

In bechmarking there are different granularity levels, similar to system tests, integration tests, and unit tests in functional testing. Microbenchmarking typically refers to isolated benchmarks of individual methods, e.g. API calls.

Similar to unit tests, having automated microbenchmarks cannot give you any guarantees. The results heavily depend on the selected input and interaction effects between different components of your architecture are not taken into consideration. Nevertheless they are a useful tool to compare the relative performance of different implementations. They can also be used for regression testing.


“ScalaMeter is a microbenchmarking and performance regression testing framework for the JVM platform that allows expressing performance tests in a way which is both simple and concise.” [3]

A simple benchmark looks like this: [4]

1import org.scalameter.api._
3object RangeBenchmark
4extends Bench.LocalTime {
5  val sizes = Gen.range("size")(300000, 1500000, 300000)
7  val ranges = for {
8    size <- sizes
9  } yield 0 until size
11  performance of "Range" in {
12    measure method "map" in {
13      using(ranges) in {
14        r => + 1)
15      }
16    }
17  }

It generates integer ranges from 0 to 300.000, 600.000, 900.000, 1.200.000, and 1.500.000 respectively. It then measures the run time of the map operation on these ranges and generates the following output:

1Parameters(size -> 300000):  1.653809 ms
2Parameters(size -> 600000):  3.282649 ms
3Parameters(size -> 900000):  4.939347 ms
4Parameters(size -> 1200000): 6.492767 ms
5Parameters(size -> 1500000): 8.148826 ms

ScalaMeter provides a highly configurable testing framework with default configuration for different standard use cases from quick console reporting all the way to sophisticated regression testing with HTML reporting. I find the following features especially useful:

  • Concise and readable DSL for data generation and test specification
  • Configurable execution (e.g. separate JVM, warm-up runs, measured runs)
  • Configurable measurements and aggregations (ignoring GC, outlier elimination, mean, median, …)
  • Configurable reporting (text, HTML, logging, charts, DSV, …)
  • Configurable persistence (Java or JSON serialization)

In the next section we are going to look at some experiments where I used ScalaMeter to perform the measurements.

Example experiments

In this section we are going to look at three experiments:

  1. How do chained map operations perform compared to a single combined map operation?
  2. How do different collections perform when being sorted? How does the Scala sort implementation perform compared to the native Java one?
  3. When building up a collection, how does the performance differ when using a builder vs. concatenating?

All experiments are performed using ScalaMeter 0.9 and Scala 2.12.4. My computer has a 2016 3,3 GHz Intel Core i7 with 16 GB of RAM. I am using the Bench.OfflineReport, which executes the code in a separate JVM and applies an appropriate number of warm-up runs.

Chained Map Operations


When working with collections in Scala, the map operation is quite common. applies the function f to every element x in xs and returns the result. If you have two composable functions f and g and you want to apply both, you express that either as

  • compose f), or

In terms of the result, both operations are equivalent. The memory footprint and runtime however might differ, depending on the implementation of xs and map. If you are using a strictly evaluated collection, on every map call the result will be computed. If the collection is immutable, a new collection will be created with the resulting values.

In this experiment we want to look at the relative runtime performance of both expressions comparing a List (strict) and a SeqView (lazy).


1val strictList = List.iterate(0, 1000000)(_ + 1)
2val lazyList = strict.view
3val f: Int => Int = _ + 1
4val fs = List.fill(10)(f)
5val fsAndThen = fs.reduce(_ andThen _)


Given both the strictList and the lazyList as l, we perform the following two experiments for both of them. Note that we omit the force command here for the sake of simplicity. In the experiment it is needed to actually trigger the computation of the view.

  •, which applies f 10 times in a single map operation
  • fs.foldLeft(l)((l, f) =>, which applies f one time in each of the 10 map operations


Looking at these results I find three notable observations:

  1. On the strict list, the chained map operations took more than three times longer on average than using the single map operation.
  2. This effect is not present when using the list view.
  3. The performance of the chained map on the list view is comparable to the strict list single map results.

Given these results, we can draw the following conclusions. Using chained map operations on strictly evaluated, immutable collections can have a significant performance impact. If performance matters, you should aim to combine your map operations. If you cannot combine the map operations yourself (maybe you are just providing a library, like Apache Spark ), using a lazily evaluated collection can help reducing the run time significantly.

Sorting Data Structures


Sorting a collection is required in many applications. May it be showing a list of events ordered by their time of occurrence, or preparing a table for being joined with another one using a merge-join algorithm. 20 years ago, developers had to be able to write efficient sorting algorithms themselves, as standard libraries were not as rich and computers not as fast.

Nowadays you will find fast-enough implementations of sorting algorithms in almost any standard library. If you are not dealing with strict performance requirements, this is also fine, as using available standard functions can make the code less buggy and more readable.

Scala offers a method to sort immutable collections called sorted, which is available for all standard sequence-like collections. In this experiment we want to compare the relative performance of sorted on different Scala data structures, and also compare it to the performance of the java.util.Arrays.sort method.


1val size = Gen.enumeration("size")(List.iterate(1, 7)(_ * 10): _*)
2val list = for { s <- size } yield List.fill(s)(Random.nextInt)
3val array = for { l <- list } yield l.toArray
4val vector = for { l <- list } yield l.toVector
1def sorted[B >: A](implicit ord: Ordering[B]): Repr = {
2   val len = this.length
3   val b = newBuilder
4   if (len == 1) b ++= this
5   else if (len > 1) {
6     b.sizeHint(len)
7     val arr = new Array[AnyRef](len)
8     var i = 0
9     for (x <- this) {
10       arr(i) = x.asInstanceOf[AnyRef]
11       i += 1
12     }
13     java.util.Arrays.sort(arr, ord.asInstanceOf[Ordering[Object]])
14     i = 0
15     while (i < arr.length) {
16       b += arr(i).asInstanceOf[A]
17       i += 1
18     }
19   }
20   b.result()
21 }

You can see that it also uses Arrays.sort internally. But why is it so much slower? I think the reason for that is that it does not only copy the data to a new array but also has to copy it back to a collection of the original type. When sorting an immutable list you expect to get another immutable list back. This is done using a respective builder b (e.g. a ListBuilder):

1while (i < arr.length) {
2  b += arr(i).asInstanceOf[A]
3  i += 1

But also the creation of the initial array could be a reason for it being slower. For making the copy, I was using Arrays.copy, which uses the native method java.lang.System.arraycopy under the hood. In the Scala method, the array is created in a loop:

1for (x <- this) {
2  arr(i) = x.asInstanceOf[AnyRef]
3  i += 1

Looking at those results, I think it is obvious that when it comes to sorting, you should always check the performance of your implementation if it matters to you. Using a sort algorithm that immediately outputs a new immutable collection instead of relying on an intermediate array would be the better choice. Nevertheless most of the time the benefits you gain from using standard methods outweigh the performance gain of custom solutions.

Concatenation vs. Builder


One of the most common beginner mistakes when working with Java is building up strings using concatenation. In Java, strings are immutable. As they are so commonly used, the JVM maintains a string pool (see flyweight pattern ).

This has an implication when concatenating strings. The following code will create three string objects:

1var s1 = "Hello "
2s1 = s1 + "World"

While this is not a problem in most cases, it can become one when used inside loops. When building up immutable data structures you should thus always rely on the respective builders, which are mutable, and then create the final, immutable structure when you are done. This is fine as long as the builder cannot be accessed outside the local scope.

In this experiment we are going to compare the performance of building up immutable data structures step by step using concatenation on the immutable object vs. using a builder to construct the final structure.


1val size = Gen.single("size")(10000)
2val numbers = for { s <- size } yield 0 to s
3val strings = for { range <- numbers } yield

share post




More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.


Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.