Scala Arrays – functional vs imperative

15.2.2016 | 5 minutes reading time

The Scala collections , which are part of the standard library, are known for their vast amount of high-level functional operations like map, flatMap, filter, sliding or groupBy, just to name a handful. These not only allow for high developer productivity – just imagine implementing something like groupBy yourself every time you need it – but usually also give us reasonable or even excellent performance. This proves true in particular when dealing with concurrent programming, because the default collections in Scala are immutable and using immutable objects instead of synchronization or defensive copies results in increased performance at large.

Nevertheless there are some situations when we have to pay a substantial penalty for using these nice high-level and thread-safe collections. Luckily Scala is a multi-paradigm language geared to real-world applications and hence lets us pick the right tool among several for the job at hand: In these situations, when collections and functional programming don’t give us the performance we need, we can use arrays and imperative programming.

The Use Case: Calculate Hex-Code for Bytes

Let’s take a look at the following use case: Write a function that takes an array of bytes – which is represented as an Array[Byte] in Scala source code and as a native JVM array after compilation – and returns an array of UTF-8 characters representing the concatenated hex-codes of all the bytes.

To give an example: Array[Byte](0, 1, 15, 16) should be transformed to Array[Char](0, 0, 0, 1, 0, F, 1, 0). As each byte corresponds to two hex characters, the resulting array has twice the size of the input.

Now you might ask how this is related to collections. Well, Scala allows us to treat an array as a collection of type scala.collection.Seq, i.e. like a sequence. Therefore we can apply all these high-level functional operations to arrays, e.g.:

1scala> Array(1, 2, 3).map(_ + 1)
2res0: Array[Int] = Array(2, 3, 4)

Of course we could use an existing library, e.g. Apache Commons Codec , but maybe we don’t want to depend on an external library for such a simple task or we just want to have some fun and hack some Scala ourselves.

Implementation Design

Scala allows us to add extension methods to any existing type without subclassing, simply by defining an implicit class that wraps a value of the type to be extended. By using a value class via extending AnyVal the Scala compiler is able to avoid creating instances of the wrapper and instead inline everything, so there’s negligible runtime overhead.

As we want to be able to call toHex and/or toHexString on any byte array, our implementation looks like this:

1implicit class ByteArrayOps(val bytes: Array[Byte]) extends AnyVal {
2  def toHexString: String = new String(toHex)
3  def toHex: Array[Char] = ???
4}

We are going to provide two implementations, one using a high-level and functional approach and one using imperative programming tuned for arrays.

Benchmarking

As we are interested in the performance of the different implementations, we obviously have to run some benchmarks. For the JVM, JMH is the de-facto standard for micro-benchmarks and since my former coworker Konrad Malawski has created sbt-jmh – an sbt-plugin for JMH – we can easily run JMH benchmarks from sbt, which is the build tool of our choice. All we have to do is add the following line to the plugins definition of our sbt project, which resides under project/plugins.sbt:

addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.6")

Functional Approach

As mentioned, we can treat an array as a sequence and hence we can use the aforementioned method flatMap to transform the given Array[Byte]:

1def toHex: Array[Char] = bytes.flatMap { byte =>
2  val high = digits((byte & 0xF0) >>> 4) // digits is Array('0', '1', ..., 'E', 'F')
3  val low = digits(byte & 0x0F)
4  Array(high, low)
5}

For each byte from the given input we calculate the high and low hex character simply by using a suitable bit mask, some bit shifting when needed and a lookup of the appropriate hex character. Then we return a new array consisting of the two hex characters which is the reason why we have to use flatMap instead of map.

For anybody familiar with the Scala collections, this approach should look straightforward. In any case it should be obvious that this implementation creates a lot of intermediate arrays, one for each element of the given input. With a little pondering it should also become clear that flatMap cannot preallocate a single array for the return value, but instead has to create an intermediate result for each step. Hence we expect this approach to create a lot of intermediate arrays and involve a substantial amount of copying, two factors which might negatively impact performance. But let’s wait and see what the benchmarks tell us.

Imperative Approach

Now, instead of transforming the given input, we preallocate the result – which we can do because we know its size for this special case – and use a loop to index into the input and result:

1def toHex: Array[Char] = {
2  val hex = Array.ofDim[Char](bytes.length * 2) // 2 hex chars for each byte
3  var n = 0
4  while (n < bytes.length) {
5    hex(n * 2) = digits((bytes(n) & 0xF0) >>> 4)
6    hex(n * 2 + 1) = digits(bytes(n) & 0x0F)
7    n += 1
8  }
9  hex
10}

Of course this code is harder to understand, because instead of simply declaring what needs to be done it expresses in great detail how to compute the result. On the other hand we may expect better performance, because we don’t allocate any arrays except for the final result and perform all element access via index which is known to be very fast for arrays.

Benchmark Results

Using the sbt-jmh plugin we can run some benchmarks. For a byte array of size 1.024 we get the following results:

jmh:run -wi 10 -i 10 -f 2 -t 1
...
[info] Benchmark                        Mode  Cnt       Score      Error  Units
[info] Benchmarks.benchmarkImperative  thrpt   20  544482.362 ± 4692.282  ops/s
[info] Benchmarks.benchmarkNaive       thrpt   20   40273.748 ±  733.762  ops/s

Of course we all know that we have to be very careful when interpreting results of micro-benchmarks. Nevertheless these results clearly show that the imperative approach is about one order of magnitude faster than the functional one, which matches our earlier assumptions.

Conclusion

We have shown that in some situations an imperative approach using arrays can be much more performant than using the functional collection API. Of course this is not to promote imperative programming, but instead to show the flexibility of Scala and the freedom to pick the right tools.

The full source code is on GitHub . As always, comments are welcome.

Was this post helpful?

Blog author

Heiko Seeberger

Do you still have questions? Just send me a message.

Macro annotations in Scala 3

In a previous blog post we took a look at macro annotations in Scala 2, where they have been present for a while. Only recently they have been added to Scala 3 as well, specifically in the pre-release version 3.3.0-RC2 of the Dotty compiler. Same as...

Scala

4.4.2023 | 9 minutes reading time

Lukas Lehmann

Macro annotations in Scala 2

In this blog post we will take a look at macro annotations, a powerful tool for code transformation and generation in Scala. Macro annotations allow us to transform the code of a definition, e.g., a class or method, at compile time. This can be used ...

Scala

28.3.2023 | 12 minutes reading time

Lukas Lehmann

Hit me baby one more time – What are cache hits and why should you care...

MotivationWhen reasoning about algorithm performance we often look at complexity. Especially when comparing different algorithms, looking at asymptotic complexity (e.g. the big-O notation) is useful. We have to keep in mind, however, that the big-O ...

APM
Software development
Scala

6.12.2019 | 11 minutes reading time

Microbenchmarking your Scala code

Motivation I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when writing software I usually favour readability, maintainability, and extensibility over speed. I agree with Donald Knuth...

Microservices
APM
Scala

29.11.2019 | 11 minutes reading time

JWT authentication with Akka HTTP

The authentication of RESTful APIs is quite an often asked question, so I decided to demonstrate basic authentication via JWT (JSON Web Token) in an example of an API built with Akka HTTP.JWT working conceptBefore we start with the actual coding, we ...

Reactive Programming
IT-Security
Scala

19.9.2017 | 6 minutes reading time

Gatling Load Testing Part 1 – Using Gatling

Gatling is a Scala-based load testing tool developed by the Gatling Corp. The tool itself is open source and can be found on GitHub . On top of the open part, an enterprise edition exists.Load tests in Gatling are written in Scala. The API for writing...

Testing
APM
Scala

20.6.2017 | 20 minutes reading time

Lookup additional data in Spark Streaming

When processing streaming data, the raw data from the events are often not sufficient. Additional data must be added in most cases, for example metadata for a sensor, of which only the ID is sent in the event.In this blog post I would like to discuss...

Software architecture
Scala
Big Data
Data
Streaming

1.6.2017 | 8 minutes reading time

Matthias Niehoff

Akka Best Practices: Defining Actor Props

Akka provides an implementation of the actor model for building reactive applications . So in Akka, an application is made up of actors rather than of plain old objects. When creating actors, we need to pass Props instances. So in this blog post I’...

Reactive Programming
Scala

10.3.2017 | 4 minutes reading time

Ad hoc polymorphism in Scala for the mere mortals

In this blog post we are going to discuss ad hoc polymorphism and the Type Class Pattern in Scala in very simple terms. No knowledge of algebraic structures is required. Starting with a simple function for adding a pair of integers, we will progress ...

Scala
Software development

23.2.2017 | 11 minutes reading time

Hello gRPC! (with ScalaPB)

gRPC is a modern RPC framework developed by Google. It picks up the traditional idea of RPC frameworks – call remote methods as easily as if they were local – while trying to avoid mistakes made by its predecessors and focusing on requirements of microservice...

Scala

10.1.2017 | 7 minutes reading time

IoT Analytics Platform

The Internet of Things a.k.a. the next industrial revolution is the current hype, but what kinds of challenges do we face with the consumption of big amounts of data? One variant is to collect all the data and do post processing in batches. However, ...

Cloud
IoT
NoSQL
Scala
Big Data

13.7.2016 | 15 minutes reading time

Spam classification using Spark’s DataFrames, ML and Zeppelin (Part 1)

This is the first entry in a series of blog posts about building and validating machine learning pipelines with Apache Spark . Its main concern is to show how to explore data with Spark and Apache Zeppelin notebooks in order to build machine learning...

Scala
Big Data
Data
Machine Learning

22.6.2016 | 16 minutes reading time

Lazy Vals in Scala: A Look Under the Hood

Scala allows the special keyword lazy in front of val in order to change the val to one that is lazily initialized. While lazy initialization seems tempting at first, the concrete implementation of lazy vals in scalac has some subtle issues. This article...

Scala

24.2.2016 | 9 minutes reading time

Phantom Types in Scala

Inspired by a recent conversation with my former colleague Brendan McAdams and my current coworker Markus Hauck , I decided to put together a quick post about phantom types, a topic perfectly suited for demonstrating the power of the type system of ...

Scala

5.2.2016 | 5 minutes reading time

Monads demystified

In this short post I want to take a look at monads from a pragmatic perspective, i.e. why and how monads can be useful for developers. I won’t talk about any theory, but instead show code examples in Scala. I’ll even call things monad which don’t fully...

Functional programming
Scala

8.12.2015 | 3 minutes reading time

Introduction to Property-based Testing using ScalaCheck

Tired of writing hundreds of unit tests manually? Write properties and let the test cases be generated automatically! We introduce the ScalaCheck library and demonstrate the benefits of property-based testing, an approach that is very different from ...

Testing
Scala

18.11.2015 | 9 minutes reading time

The Essence of Object-Functional Programming and the Practical Potential...

The terms “object-functional” and “object-functional programming” are heard time and again in the context of software development. But what does the object-functional approach look like and what advantages does it have? Isn’t object-orientation or the...

Software architecture
Software development
Scala

30.8.2015 | 7 minutes reading time

A Map of Akka

The amazing Akka project was started by Jonas Bonér in 2009 with the aim to bring the actor model , which has proven to deliver an availability of six nines (99.9999%) and even more, to the JVM. Akka, which is open source and available under the Apache...

Scala
Reactive Programming

26.7.2015 | 8 minutes reading time

The Scala Type System: Parameterized Types and Variances, Part 1

The Scala language has been published in 2004 and is continuously developed by EPFL and Typesafe . These activities are funded on the one hand by the European Union and on the other hand by industrial investors . Scala has gained popularity in recent...

Scala

6.3.2015 | 6 minutes reading time