Explain non-blocking I/O like I’m five

4.4.2019 | 8 minutes reading time

Introduction

Ten years ago there was a major shift in the field of network application development. In 2009 Ryan Dahl invented Node.js because he was not happy with the limited possibilities of the popular Apache HTTP Server to handle thousands of concurrent connections. The Node.js project combined a JavaScript engine, an event loop, and an I/O layer. It is commonly referred to as a non-blocking web server.

The idea of non-blocking I/O in combination with an event loop is not new. The Java community added the NIO module to J2SE 1.4 already back in 2002. Netty , a non-blocking I/O client-server framework for the development of Java network applications, is actively developed since 2004. Operating systems are offering functionality to get notified as soon as a socket is readable or writable even since before that.

Nowadays you often hear or read comments like “X is a non-blocking, event-driven, scalable, [insert another buzzword here] framework”. But what does it mean and why is it useful? The remainder of this post is structured as follows. The next section will illustrate the concept of non-blocking I/O with a simple analogy. Afterwards we will discuss advantages and disadvantages of non-blocking I/O. The next section allows us to take a glimpse into how non-blocking I/O is implemented in different operating systems. We will conclude the post by giving some final thoughts.

Your own table factory

Your first employee and work bench

Imagine you are starting a business which produces tables. You are renting a small building and buying a single work bench because you only have one employee, let’s call him George. In the morning, George enters the building, goes to the work bench, and picks a new order from the inbox.

Tables vary in size and color. The respective resources and supplies are available in the store room. However sometimes the store room does not have the required materials, e.g. a color is missing, so George has to order new supplies. Because George likes to finish one thing before he starts another, he will simply wait at the work bench until the new supplies are delivered.

In this analogy, the factory represents a computer system, the work bench represents your CPU, and George is a working thread. Ordering new supplies corresponds to I/O operations and you can be seen as the operating system, coordinating all the interactions. The CPU has no multi-tasking capabilities and every operation is not only blocking a thread but the whole CPU and thus the whole computer.

Multiple employees, single workbench

You are wondering if you could increase the productivity by convincing George to work on something else while the supplies are being delivered. It can take multiple days before a new delivery arrives and George will just stand there doing nothing. You confront him with your new plan but he replies: “I’m really bad at context switching, boss. But I’d be happy to go home and do nothing there so I’m at least not blocking the work bench!”.

You realize that this is not what you had hoped for but at least you can hire another employee to work at the bench while George is at home, waiting for the delivery. You are hiring Gina and she is assembling another table while George is at home. Sometimes George has to wait for Gina to finish a table before he can continue his work but the productivity is almost doubled nevertheless, because George’s waiting time is utilized much better.

By having multiple employees sharing the same workbench we introduced a form of multi-tasking. There are different multi-tasking techniques and here we have a very basic one: As soon as a thread is blocked waiting for I/O it can be parked and another thread can use the CPU. In I/O heavy applications this approach however requires us to hire more employees (spawn more threads) that will be waiting. Hiring workers is expensive. Is there another way to increase productivity?

Multitasking, non-blocking employees

In her second week, Gina also ran out of supplies. She realized that it is actually not that bad to simply work on another table while waiting for the delivery so she asks you to send her a text message when the delivery arrived so she can continue working on that table as soon as she finishes her current work or is waiting for another delivery.

Now Gina is utilizing the work bench from 9 to 5 and George realizes that she is way more productive than him. He decides to change jobs, but luckily Gina has a friend who is as flexible as her and thanks to all the tables you sold you can afford a second work bench. Now each work bench has an employee working the whole day, utilizing waiting time for supply deliveries to work on another order in the meantime. Thanks to your notification on arrived deliveries they can focus on their work and do not have to check the delivery status on a regular basis.

After changing the working mode to no longer idle when waiting for deliveries, your employees are perfoming I/O in a non-blocking way. Although George was also no longer blocking the CPU after he started waiting for the delivery at home, he was still waiting and thus blocked. Gina and her friend are simply working on something else, suspending the assembly of the table which requires supplies to be delivered, waiting for the operating system to signal them that the I/O result is ready.

Benefits of non-blocking I/O

I hope the previous analogy made it clear what the basic idea of non-blocking I/O is. But when is it useful? Generally one can say that the benefit starts kicking in once your workload is heavily I/O bound. Meaning your CPU would spend a lot of time waiting for your network interfaces, for example.

Using non-blocking I/O in the right situation will improve throughput, latency, and/or responsiveness of your application. It also allows you to work with a single thread, potentially getting rid of synchronization between threads and all the problems associated with it. Node.js is single-threaded, yet can handle millions of connections with a couple of GB RAM without problems.

A common misconception lies in the fact that non-blocking I/O means fast I/O. Just because your I/O is not blocking your thread it does not get executed faster. As usual there is no silver bullet but only trade-offs. There is a nice blog post on TheTechSolo discussing advantages and disadvantages of different concepts around this topic.

Implementations

There are many different forms and implementations of non-blocking I/O. However all major operation systems have built-in kernel functions that can be used to perform non-blocking I/O. epoll is commonly used on Linux systems and it was inspired by kqueue (research paper ) which is available in BSD based systems (e.g. Mac OS X).

When using Java, the developer can rely on Java NIO. In most JVM implementations you can expect Java NIO to use those kernel functions if applicable. However there are some subtleties when it comes to the details. As the Java NIO API is generic enough to work on all operating systems, it cannot utilize some of the advanced features that individual implementations like epoll or kqueue provide. It resembles very basic poll semantics.

Thus if you are looking for a little bit of extra flexibility or performance you might want to switch to native transports directly. Netty , one of the best network application framework on the JVM, supports both Java NIO transports as well as native libraries for Linux and Mac OS X.

Of course most of the time you are not going to work with Java NIO or Netty directly but use some web application framework. Some frameworks will allow you to configure your network layer to some extend. In Vert.x , for example, you can choose whether you want to use native transports if applicable and it offers

EpollTransport based on Netties EpollEventLoopGroup ,
KQueueTransport based on KQueueEventLoopGroup , and
Transport based on NioEventLoopGroup .

Final thoughts

The term non-blocking is used in many different ways and contexts. In this post we were focusing on non-blocking I/O which refers to threads not waiting for I/O operations to finish. However sometimes people refer to APIs as non-blocking only because they do not block the current thread. But that doesn’t necessarily mean they perform non-blocking I/O.

Take JDBC as an example. JDBC is blocking by definition. However there is a JDBC client out there which has an asynchronous API. Does it block your thread while waiting for the response of the database? No! But as I mentioned earlier, JDBC is blocking by definition so who is blocking? The trick here is simply to have a second thread pool that will take the JDBC requests and block instead of your main thread.

Why is that helpful? It allows you to keep doing your main work, e.g. answering to HTTP requests. If not every requests needs a JDBC connection you can still answer those with your main thread while your thread pool is blocked. This is nice but still blocking I/O and you will run into bottlenecks as soon as your work becomes bound by the JDBC communication.

The field is very broad and there are many more details to explore. I believe however that with a basic understanding of blocking vs. non-blocking I/O you should be able to ask the right questions when you run into performance problems. Did you ever use native transports in your application? Did you do it because you could or because you were fighting with performance issues? Let me know in the comments!

Cover image by Paul Englefield

Was this post helpful?

Blog author

Frank Rosner

Do you still have questions? Just send me a message.

fromFrank Rosner

Implementing and testing an Angular feature flag directive

Introduction An important goal of agile software development is to shorten the user feedback loop. To achieve that you want to release your changes as often as possible. This also includes releasing prototypes, e.g. to a smaller audience, gathering customer...

Frontend
Angular
JavaScript
Testing
Webdevelopment

18.5.2020 | 6 minutes reading time

Frank Rosner

Implementing a consumer-driven contract testing workflow with Pact broker...

Introduction In the previous posts we learned that the Pact workflow requires you to exchange contracts and verification results between consumers and providers. We introduced two approaches on how the contract exchange can happen: 1) committing the...

DevOps
API
Test Driven Development
Testing

24.2.2020 | 12 minutes reading time

Frank Rosner

Raffael Stein

Publishing application metrics to CloudWatch using Micrometer

Why metrics? In my post about Quality attributes in software we introduced observability as an important quality attribute of modern software applications. Observability expresses whether changes in a system are reflected in a quantitative measure. ...

AWS
Cloud
DevOps
Kotlin
APM

21.12.2019 | 9 minutes reading time

Frank Rosner

Concurrency and automatic conflict resolution

Introduction Modern software applications are often required to be reliable and scalable. By combining multiple unreliable components into one bigger, distributed system, we can achieve higher reliability and scalability than what would have been possible...

Data
Database
Software architecture
Software development

20.12.2019 | 11 minutes reading time

Frank Rosner

Hit me baby one more time – What are cache hits and why should you care...

Motivation When reasoning about algorithm performance we often look at complexity. Especially when comparing different algorithms, looking at asymptotic complexity (e.g. the big-O notation) is useful. We have to keep in mind, however, that the big-O...

APM
Software development
Scala

6.12.2019 | 11 minutes reading time

Frank Rosner

Microbenchmarking your Scala code

Motivation I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when writing software I usually favour readability, maintainability, and extensibility over speed. I agree with Donald Knuth...

Microservices
APM
Scala

29.11.2019 | 11 minutes reading time

Frank Rosner

Message Pact – Contract testing in event-driven applications

Introduction In the previous blog post we introduced contract testing with Pact as an alternative to end-to-end tests when developing distributed applications. Pact works great for interactions between services that follow a request-response pattern...

Agile
Kotlin
Microservices
API
Test Driven Development

18.11.2019 | 9 minutes reading time

Raffael Stein

Frank Rosner

Consumer-driven contract testing with Pact

Introduction Consumer-driven contract testing is an alternative to end-to-end tests where not all services have to be deployed at the same time. It enables testing a distributed system in a decoupled way by decomposing service interactions into consumer...

JavaScript
Kotlin
API
Test Driven Development

3.10.2019 | 11 minutes reading time

Frank Rosner

Raffael Stein

Understanding the AWS Lambda SQS integration

Introduction AWS offers different components for building scalable, reliable, and secure cloud applications. Lambda is a service to execute code on demand. A Lambda function can be invoked in many different ways, e.g. by an API Gateway as part of a “...

AWS
Cloud
DevOps
Serverless

11.8.2019 | 7 minutes reading time

Frank Rosner

Let’s also apply run with Kotlin scope functions

Scope functions In Kotlin, scope functions allow you to execute a function, i.e. a block of code, in the context of an object. The object is then accessible in that temporary scope without using the name. Although whatever you do with scope functions...

8.7.2019 | 5 minutes reading time

Frank Rosner

Resilience design patterns: retry, fallback, timeout, circuit breaker

What is resilience? Software is not an end in itself: it supports your business processes and makes customers happy. If software is not running in production it cannot generate value. Productive software, however, also has to be correct, reliable, and...

Software architecture
Microservices
Search
Resilience

24.6.2019 | 10 minutes reading time

Frank Rosner

Alexander Potukar

Testing your database migrations with Flyway and Testcontainers

Why database migrations? Database migrations are usually a combination of schema and data migrations in databases. A schema migration denotes a change in an existing database schema, e.g. adding a column or creating a new index. A data migrationinvolves...

CI/CD
Kotlin
Database
Testing

6.6.2019 | 5 minutes reading time

Frank Rosner

Docker demystified

Introduction Since its open source launch in 2013, Docker has become one of the most popular pieces of technology out there. A lot of companies are contributing, and a huge number of people are using and adopting it. But why is it so popular? What does...

DevOps
Container
Linux
Software architecture

3.6.2019 | 15 minutes reading time

Frank Rosner

Ten cognitive biases to look out for as a developer

Introduction Cognitive biases can be viewed as bugs in our thinking when collecting, processing, and interpreting information. From an evolutionary standpoint they are features rather than bugs as they often enable us to be happy, social, and thus to...

Software development

20.5.2019 | 10 minutes reading time

Frank Rosner

Vert.x Kotlin Coroutines

Vert.x Eclipse Vert.x is an event-driven application framework that runs on the JVM. Architecturally it is very similar to Node.js, having a single-threaded event loop at its core and it heavily relies on non-blocking operations in order to be scalable...

Java
Kotlin

13.2.2019 | 6 minutes reading time

Frank Rosner

How to identify relevant quality attributes in software

Introduction When designing a system architecture, you will have to take decisions. Those decisions will influence how your system is going to behave in different scenarios. The behaviour will impact the functionality of the system or product in one ...

Software architecture
Microservices

11.2.2019 | 10 minutes reading time

Frank Rosner

Monitoring AWS Lambda functions with CloudWatch

Introduction Functions as a Service products like AWS Lambda provide a great deal of convenience compared to bare metal, virtual machines, and also containerized deployments. You only have to manage the actual code you want to run and the rest is taken...

AWS
Cloud
Serverless

23.10.2018 | 10 minutes reading time

Frank Rosner

Window Functions in Stream Analytics

Introduction to Stream Analytics Why should we talk about stream analytics? In the past decades data analytics was dominated by batch processing. Records from transactional databases were copied into analytical databases by regular extract-transform-...

Big Data
Data
Streaming

11.10.2018 | 11 minutes reading time

Frank Rosner

Terraform Multi-Provider Deployment Including a Custom Provider

Introduction In the post Continuous Delivery on AWS with Terraform and Travis CI we have seen how Terraform can be used to manage your infrastructure as code and automate your deployments. When working on a project involving different infrastructure...

Software architecture
Open Source
AWS
Cloud
DevOps
Go

9.8.2018 | 9 minutes reading time

Frank Rosner

Continuous Delivery on AWS with Terraform and Travis CI

Introduction At codecentric we use Terraform extensively to automate infrastructure deployments. If you are aiming at true continuous delivery, a high degree of automation is crucial. Continuous delivery (CD) is about producing software in short cycles...

Cloud
CI/CD
Infrastructure
AWS
DevOps

29.7.2018 | 12 minutes reading time

Frank Rosner

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Explain non-blocking I/O like I’m five

Introduction

Your own table factory

Your first employee and work bench

Multiple employees, single workbench

Multitasking, non-blocking employees

Benefits of non-blocking I/O

Implementations

Final thoughts

Was this post helpful?

Blog author

More articles

Implementing and testing an Angular feature flag directive

Implementing a consumer-driven contract testing workflow with Pact broker...

Publishing application metrics to CloudWatch using Micrometer

Concurrency and automatic conflict resolution

Hit me baby one more time – What are cache hits and why should you care...

Microbenchmarking your Scala code

Message Pact – Contract testing in event-driven applications

Consumer-driven contract testing with Pact

Understanding the AWS Lambda SQS integration

Let’s also apply run with Kotlin scope functions

Resilience design patterns: retry, fallback, timeout, circuit breaker

Testing your database migrations with Flyway and Testcontainers

Docker demystified

Ten cognitive biases to look out for as a developer

Vert.x Kotlin Coroutines

How to identify relevant quality attributes in software

Monitoring AWS Lambda functions with CloudWatch

Window Functions in Stream Analytics

Terraform Multi-Provider Deployment Including a Custom Provider

Continuous Delivery on AWS with Terraform and Travis CI

Your job at codecentric?

Agile Developer und Consultant (w/d/m)