Resilience design patterns: retry, fallback, timeout, circuit breaker

24.6.2019 | 10 minutes reading time

What is resilience?

Software is not an end in itself: it supports your business processes and makes customers happy. If software is not running in production it cannot generate value. Productive software, however, also has to be correct, reliable, and available.

When it comes to resilience in software design, the main goal is build robust components that can tolerate faults within their scope, but also failures of other components they depend on. While techniques such as automatic fail-over or redundancy can make components fault-tolerant, almost every system is distributed nowadays. Even a simple web application can contain a web server, a database, firewalls, proxies, load balancers, and cache servers. Additionally, the network infrastructure on its own consists of so many components that there are always failures happening somewhere.

Besides the total failure scenario, services might also take a longer time to respond. In reality it might also happen that they even answer semantically in a wrong way, although their response format is correct. And again, the more components a system has, the more likely it is that something will fail.

Availability is often considered an important quality attribute. It expresses the amount of time a component is actually available, compared to the amount of time the component is supposed to be available. It can be expressed with the following formula:

Traditional approaches aim at increasing the uptime, while modern approaches aim for reduced recovery times and thus downtimes. This is useful because it allows us to deal with failures rather than trying to prevent them at all costs and being unavailable for a long time in case they do happen. Uwe Friedrichsen categorizes resilience design patterns into four categories: Loose coupling, isolation, latency control, and supervision.

In this blog post we want to take a look at four patterns from the latency control category: Retry, fallback, timeout, and circuit breaker. After a theoretical introduction we will see how these patterns can be applied in practice using Eclipse Vert.x. We are closing the post by discussing alternative implementations and summarizing the findings.

The patterns

Example scenario

To illustrate the functionality of the patterns, we will utilize a very simple example use case. Imagine a payment service as part of a shopping platform. When a client wants to make a payment, the payment service should make sure there is no fraudulent intention. To do that, it asks a fraud check service.

In this case our services offer HTTP based interfaces. To check the transaction, the payment service sends an HTTP request to the fraud check service. If everything works well, there will be a 200 response with the boolean indicating whether the transaction is fraudulent or not. But what if the fraud check service is not answering? What if it returns an internal server error (500)?

Let’s take a look at the four concrete patterns to address possible communication issues now. While this is a concrete example, you can imagine any other constellation that involves communication with an unreliable service over an unreliable channel.

Retry

Whenever we assume that an unexpected response – or no response for that matter – can be fixed by sending the request again, using the retry pattern can help. It is a very simple pattern where failed requests are retried a configurable number of times in case of a failure before the operation is marked as a failure.

The following animation illustrates the payment service attempting to issue a fraud check. The first request fails due to an internal server error in the fraud check service. The payment service retries the request and receives the answer that the transaction is not fraudulent.

Retries can be useful in case of

Temporary network problems such as packet loss
Internal errors of the target service, e.g. caused by an outage of a database
No or slow responses due to a large number of requests towards the target service

Keep in mind, however, that if the problems are caused by the target service being overloaded, retrying might make those problems even worse. To avoid turning your resilience pattern into a denial of service attack, retry can be combined with other techniques such as exponential backoff or a circuit breaker (see below).

Fallback

The fallback pattern enables your service to continue the execution in case of a failed request to another service. Instead of aborting the computation because of a missing response, we fill in a fallback value.

The following animation again depicts the payment service issuing a request to the fraud check service. Again, the fraud check service returns an internal server error. This time, however, we have a fallback in place which assumes that the transaction is not fraudulent.

Fallback values are not always possible but can greatly increase your overall resilience if used carefully. In the example above it can be dangerous to fallback to treating the transaction as not fraudulent in case the fraud check service is not available. It even opens up an attack surface for fraudulent transactions attempting to first spam the service and then place the fraudulent transaction.

On the other hand, if the fallback is to assume that every transaction is fraudulent, no payment will go through and the fallback is essentially useless. A good compromise might be to fallback to a simple business rule, e.g. simply letting transactions with a reasonably small amount through to have a good balance between risk and not losing customers.

Timeout

The timeout pattern is pretty straightforward and many HTTP clients have a default timeout configured. The goal is to avoid unbounded waiting times for responses and thus treating every request as failed where no response was received within the timeout.

The animation below shows the payment service waiting for the response from the fraud check service and aborting the operation after the timeout exceeded.

Timeouts are used in almost every application to avoid requests getting stuck forever. Dealing with timeouts is not trivial, however. Imagine an order placement timing out in an online shop. You cannot be sure if the order was placed successfully but the response timed out if the order creation was still in progress, or the request was never processed. If you combine the timeout with a retry, you might end up with a duplicate order. If you mark the order as failed, the customer might think the order didn’t succeed but maybe it did and they will get charged.

Also you want your timeouts to be high enough to allow slower responses to arrive but low enough to stop waiting for a response that is never going to arrive.

Circuit breaker

In electronics, a circuit breaker is a switch that protects your components from damage through overload. In software, a circuit breaker protects your services from being spammed while already being partly unavailable due to high load.

The circuit breaker pattern was described by Martin Fowler. It can be implemented as a stateful software component that switches between three states: closed (requests can flow freely), open (requests are rejected without being submitted to the remote resource), and half-open (one probe request is allowed to decide whether to close the circuit again). The animation below illustrates a circuit breaker in action.

The request from the payment service to the fraud check service is passed through the circuit breaker. After two internal server errors the circuit opens and subsequent requests are blocked. After some waiting time the circuit goes to the half-open state. In this state it will allow one request to pass and change back to the open state in case it fails, or to closed in case of success. The next request succeeds so the circuit is closed again.

Circuit breakers are a useful tool, especially when combined with retries, timeouts and fallbacks. Fallbacks can be used not only in case of failures, but also if the circuit is open. In the next section we will take a look at a code example with Vert.x written in Kotlin.

Implementation in Vert.x

In the last section we took a look at different resilience patterns from a theoretical point of view. Now let’s see how you can implement them. The source code of the example is available on GitHub. We will use Vert.x with Kotlin for this showcase. Other alternatives are discussed in the next section.

Vert.x offers CircuitBreaker , a powerful decorator class which supports arbitrary combinations of retry, fallback, timeout, and circuit breaker configurations. You can configure the circuit breaker using the CircuitBreakerOptions class as shown below.

1val vertx = Vertx.vertx()
2val options = circuitBreakerOptionsOf(
3    fallbackOnFailure = false,
4    maxFailures = 1,
5    maxRetries = 2,
6    resetTimeout = 5000,
7    timeout = 2000
8)
9val circuitBreaker = CircuitBreaker.create("my-circuit-breaker", vertx, options)

In this example we are creating a circuit breaker that retries the operation twice before treating it as failed. After one failure we are opening the circuit which will be half-open again after 5000 ms. Operations time out after 2000 ms. If a fallback is specified, it will be called only in case of an open circuit. It is also possible to configure the circuit breaker to call the fallback in case of a failure even if the circuit is closed.

In order to execute a command, we need to provide an asynchronous piece of code to execute of type Handler> as well as a handler of type Handler> that processes the result. A minimal example that returns OK and prints it afterwards looks like this:

1circuitBreaker.executeCommand(
2    Handler<Future<String>> {
3        it.complete("OK")
4    },
5    Handler {
6        println(it)
7    }
8)

When working with Vert.x in Kotlin you can also pass suspend functions as arguments instead of working with handlers. Please refer to the CoroutineHandlerFactory class and its usages for more details. In addition to these basic features, the Vert.x circuit breaker module offers the following advanced features:

Event bus notifications. The circuit breaker can publish an event to the event bus on every state change. This is useful if you want to react to those events in some way.
Metrics. The circuit breaker can publish metrics to be consumed by the Hystrix dashboard to visualize the state of your circuit breakers.
State change callbacks. You can configure custom handlers to be invoked when the circuit opens or closes.

Alternative implementation approaches

Not every framework supports resilience design patterns out of the box. Also Vert.x does not support all possible patterns. There are designated projects addressing resilience topics directly, such as Hystrix , resilience4j , failsafe , and the resilience features of Istio .

Hystrix has been used in many applications but is no longer under active development. Hystrix, resilience4j, as well as failsafe are directly called from within the application source code. You can integrate it either by implementing interfaces or using annotations, for example.

Istio on the other hand is a service mesh and thus part of the infrastructure rather than the application code. It is used to orchestrate a distributed system of services and implements the concept of a sidecar . Service communication happens through that sidecar, which is a dedicated process alongside the service process. The sidecar can then handle mechanisms such as retry.

The advantage of a sidecar approach is that you do not mix business logic with resilience logic. You can replace the sidecar technology without touching too much of the application code. Additionally, you can easily modify and adapt the sidecar configuration without redeploying the service. The disadvantage lies in the disability to use specific patterns such the bulkhead pattern for thread pool isolation. Additionally, patterns like fallback values heavily depend on your business logic. It might also be easier to extend your existing code base rather than adding a new infrastructure component.

Summary

In this post we have seen how loose coupling, isolation, latency control and supervision can positively affect system resilience. The retry pattern enables dealing with communication errors that can be corrected by attempting them multiple times. The fallback pattern helps resolve communication failures locally. The timeout pattern provides an upper bound to latency. The circuit breaker addresses the problem of accidental denial of service attacks due to retries and fast fallbacks in case of persisting communication errors.

Frameworks like Vert.x provide some resilience patterns out of the box. There are also dedicated resilience libraries which can be used with any framework. Service meshes on the other hand exist as an option to introduce resilience patterns on an infrastructure level. As always there is no one-size-fits-all solution and your team should figure out what works best for them.

Was this post helpful?

Blog authors

Frank Rosner

Do you still have questions? Just send me a message.

Alexander Potukar

Do you still have questions? Just send me a message.

Is Spring Boot Becoming Obsolete?

In March 2026, we kicked off a modernization project for a client. Spring Boot was an obvious choice. There was a strategic decision behind it. There was existing know-how. There was existing infrastructure. The team was set. The work began. One of the...

Generative AI
LLM
AI
Software development
Software architecture

27.4.2026 | 7 minutes reading time

Johannes Barop

DeepFake: Detect AI-Generated Images in 5 Steps

We live in a time when an image is no longer a reliable guarantee of truth. AI‑generated content floods social media feeds, news platforms and messenger groups every single day, and only very few people are able to tell the difference. What once required...

IT-Security
AI
Generative AI
Search
Google
data protection
Digitalization

16.3.2026 | 5 minutes reading time

Pull off Architecture Reviews at Light-Speed with LASR!

Foreword: This blog is loosely based on a recent project experience. All persons, companies and names are fictitious, as to make them NDA compliant. Any resemblance to a person, existing company or brand is purely coincidental and unintentional.For most...

Software architecture

4.4.2025 | 13 minutes reading time

Feature-Sliced Design and what we need for good frontend architecture

Feature-Sliced Design and what we need for good frontend architecture While a lot has been published on the topic of software architecture in the backend, and there are well-established best practices, this topic is less prominent for frontend applications...

Software architecture
Frontend

23.1.2025 | 10 minutes reading time

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 minutes reading time

Danny Keller

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal...

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal Architecture Modularization is a key concept in modern software development to make applications maintainable, testable and flexible. In this article we will see how Spring Modulith...

Software architecture
Kotlin
Spring

14.1.2025 | 9 minutes reading time

Danny Keller

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 minutes reading time

Danny Keller

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

API gateways are essential for managing and securing data flow between services. As software architectures evolve, different types of API gateways have emerged to address specific challenges: Legacy, Agnostic, and Kubernetes-native. Drawing on insights...

API
Software architecture
Infrastructure
Integration

21.8.2024 | 12 minutes reading time

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 minutes reading time

Dr. Florian Rademacher

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

A few weeks ago, when we introduced Dapr, we also discussed its overlapping capabilities with a service mesh, although Dapr itself is not a service mesh. As already mentioned in a previous blogpost, in recent years service meshes have become a pivotal...

Networking
Microservices
Kubernetes
Cloud native

1.8.2024 | 16 minutes reading time

Manuel Zapf

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 minutes reading time

Manuel Zapf

Spring Boot and HTMX: The boring app

Motivation Most apps I touched in the wild follow the same two tiered approach. A backend delivering JSON (some may call this REST) and a frontend framework, consuming JSON from the backend converting it to the HTML displayed to the user. Worst case,...

Software architecture
Software development
Spring
Kotlin

28.6.2024 | 16 minutes reading time

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 minutes reading time

Manuel Zapf

Zero Trust Azure Identity & Access Architecture

Falko Lehmann and Hendrik Kamp have already explained in their blog post on Zero-trust Architecture why zero-trust security models are preferable to traditional perimeter security models in order to minimize damage from cyber attacks. Falko and Hendrik...

IT-Security
IAM
Azure
Software architecture

4.6.2024 | 14 minutes reading time

Charge your APIs Volume 19: Understanding Problem Details for HTTP APIs...

In today's ever-changing web development landscape, HTTP APIs have become indispensable, powering a myriad of applications and services across the internet. They act as the vital communication backbone, enabling smooth data exchanges between different...

API
Resilience

30.11.2023 | 16 minutes reading time

Daniel Kocot

Plug-in architectures with WebAssembly

Plug-in architectures are an essential concept for developing customizable software. In a plug-in architecture, the application logic is split into a host (or core) system and a number of plug-in components. These plug-ins enable customers to tailor ...

Software architecture
Webdevelopment
Backend

3.11.2023 | 13 minutes reading time

Julian Arz

Charge your APIs Volume 15: API Gateways - Navigating the Agony of Choice...

In the dynamic world of APIs, our previous exploration into API Managment and APIOps shed light on the intricacies of managing and streamlining API operations. As we delve deeper into this realm, another critical component emerges at the forefront: API...

API
Software architecture

7.9.2023 | 7 minutes reading time

Daniel Kocot

Architecture docs as code with Structurizr & Asciidoctor. Part 5: Generating...

You are reading the final part of this article series about architecture documentation as code. In the previous articles a workflow was implemented that aims to reduce the efforts for maintaining long-living architecture documentation, keep it up to ...

Software architecture
Documentation

20.12.2022 | 19 minutes reading time

Christoph Knauf

Architecture docs as code with Structurizr & Asciidoctor. Part 4: Publishing

You are reading the fourth part of this article series about architecture documentation as code. If you worked through the previous articles, you already automated the generation of your architecture documents using Asciidoctor and integrated the diagrams...

Software architecture
Documentation

28.10.2022 | 7 minutes reading time

Christoph Knauf

Resilience design patterns: retry, fallback, timeout, circuit breaker

What is resilience?

The patterns

Example scenario

Retry

Fallback

Timeout

Circuit breaker

Implementation in Vert.x

Alternative implementation approaches

Summary

Was this post helpful?

Blog authors

More articles in this subject area

Is Spring Boot Becoming Obsolete?

DeepFake: Detect AI-Generated Images in 5 Steps

Pull off Architecture Reviews at Light-Speed with LASR!

Feature-Sliced Design and what we need for good frontend architecture

Hexagonal Architecture is just an island

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal...

Charge your APIs Volume 36 - Trends for 2025

ArchUnit in practice: Keep your Architecture Clean

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

When Business Meets Technology: From Data Product to Data Architecture...

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

Exploring Dapr: A Deep Dive into Distributed Application Runtime

Spring Boot and HTMX: The boring app

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

Zero Trust Azure Identity & Access Architecture

Charge your APIs Volume 19: Understanding Problem Details for HTTP APIs...

Plug-in architectures with WebAssembly

Charge your APIs Volume 15: API Gateways - Navigating the Agony of Choice...

Architecture docs as code with Structurizr & Asciidoctor. Part 5: Generating...

Architecture docs as code with Structurizr & Asciidoctor. Part 4: Publishing