Persistence without Persistence

17.6.2012 | 6 minutes reading time

NoSQL-databases typically run on virtual machines in the cloud. But if the machines they run on are virtual, how can persistence be ensured?

Enterprise relational database management systems typically run on expensive robust and highly reliable hardware. Frequently, large sums are invested to make sure the hardware to run as fail save as possible. And a typical db admin would insist on taking such measurements.

In the cloud, we rarely find this situation. Cloud computing hardware is typically commodity hardware. Certainly none of the cloud computing providers use cheap hardware from “your dealer around the corner”, just because it is way too expensive to maintain. But on the other hand, servers are typically not supplied with a redundant power supply. And disks are not connected to build RAID-arrays or other fault-tolerant systems.

As a consequence, service providers inform tenants that they cannot rely on all virtual nodes to work flawlessly. Actually, one should expect that nodes fail every once in a while. This has interesting consequences. If your DB server runs on a virtual machine, the hard disk the server writes to is virtual, too. Of course there must be a physical disk behind it. But it is not accessible. If the node the virtual machine runs on goes down, then so does the virtual machine. What happens to the data on the virtual hard disk? It gets lost. Even if you restart the virtual machine on the same node, there is no way to access the data the DB server previously wrote to the virtual disk. If you wanted to prevent such a situation to happen you would have to take snapshots of the virtual machine (including the disk). If you have many write operations to the DB, that would have to be done continuously in short intervals. This is obviously not viable.
In other words, there is no persistence available for DB servers running on virtual machines.

So, if virtual machines are transient, how can persistence be achieved? The answer is pretty similar to the one when using dedicated hardware: data replication. Each individual datum to be stored in the DB should be stored on several different virtual machines on several different nodes. Threefold replication seems to be some kind of a standard here. The idea behind this approach is that while we cannot rely on the individual machines it is regarded very unlikely that all three machines storing a datum to go down at the same time. If it is just one or two machines, the datum is still available. And several NoSQL DB servers contain built-in mechanisms to automatically restore the number of replications if a node goes down. Others leave this task to the application developers.

Is this the end of the story? Unfortunately not really. In August last year, Amazon’s European EC2 center suffered an outage as a consequence of a lightning hitting the transformers close to their site (see, e.g., here or here.) And the lightning also hit the secondary power supply, with the consequence that the data center suffered a full power outage. I don’t actually care whether this is the right explanation for that particular incident. It is enough to see that a power outage of a computing center is in fact possible and not something to be considered to be too unlikely to happen.

The obvious solution is to introduce data replication across data centers. But this is where problems start. Replication within a single data center is relatively simple because all nodes are connected by high bandwidth. Thus lots of communication traffic between nodes is relatively unproblematic. Such a bandwidth is obviously no longer available between two different data centers, perhaps even located on different continents. Bandwidth over the Internet is clearly the limiting factor in data replication. Full real-time replication of huge data sets with many fast changes to data sets is impossible.

There is yet another effect to be considered. If a data center remains down or unreachable for a prolonged period, tenants of the data center will start moving their applications to other data centers of the same provider. This may in effect turn these data centers to become unreachable, too, due to overload.

In such a situation there is no uniform solution to data replication across data centers that fits everybody’s needs. It is rather the individual requirements of applications that drive potential solutions. There is basically three types of data to be distinguished:

data that does nor require cross data center replication,
data that should be replicated across data centers sometime,
data that requires immediate cross data center replication.

Let us try to explain this by means of an example. Think of a web shop and a new customer trying to place an order. The items in the shopping cart are transient data anyway. There is no need to replicate it across data centers. As long as the order is not completed by the customer, an order data loss due to the unlikely event of a data center outage is an event that is economically sustainable. The customer just has to re-enter what he wants to order. And if needed, the customers browser can be used as a backup by storing the cart content in a cookie.

Customer address and payment information are data that should be replicated across data centers. After all one wouldn’t want to loose all customers or their data due to a data center outage. But it is unlikely that an immediate replication is required. I’d rather propose to eventually replicate. If a data center outage happens it is only a small amount of customer data that is affected, namely only the changes that happened after the last replication.

It is difficult to find any type of data that requires immediate replication in the given example. A potential example might be payment related data indicating that a customer lost his status as a reliable payer and thus may no longer place any orders for example as a consequence of some fraud detection. In such a situation the importance of this information may be so high that an immediate replication is the action of choice.

An analysis along these lines has to be performed for each individual application. Applications that require a cross data center replication of data to happen eventually are still viable. If the requirement of immediate replication of large amounts of data is the result of such an analysis, the situation is really difficult. There is no ready-made solution at hand. But these cases have to be carefully considered. Why is it the case that large amounts of data need to be replicated immediately? Is it really necessary to replicate such an amount of data? Answers to questions of this type are likely not to be technical in nature but rather business-driven.

To catch up, if persistence is taken serious then data replication across data centers is required. But bandwidths over the Internet, that are known to be orders of magnitude smaller than the ones available within data centers, prohibit immediate replication of large amounts of data. It is therefore necessary to identify and down-size the amount of data that really requires immediate replication. For all other data a replication across data centers that takes place eventually should suffice. It is also well advised to automatically detect the outage of a data center to stop all futile communication efforts.

Was this post helpful?

Blog author

Stephan Kepser

Do you still have questions? Just send me a message.

fromStephan Kepser

Selenium WebDriver for Safari 8

This is just a short note on how to get the Selenium WebDiver installed and running for the browser Safari (ver. 8) under Mac OS 10.10 “Yosemite” . It isn’t that easy to find the solution on the internet. Core insight is that you need a WebDriver ...

Webdevelopment
Testing

4.2.2015 | 1 minutes reading time

Stephan Kepser

German Data Protection Legislation and the USA PATRIOT Act

On Tuesday, 6th December, several it news tickers (see, e.g., heise online ) announced that Microsoft is about to change the end user agreement for its cloud service Office 365 in such a way that it conforms to German and European data protection legislation...

9.12.2011 | 4 minutes reading time

Stephan Kepser

Selenium 1 Remote Control Plugin for Firefox 5 and 6

Selenium is a powerful tool for web browser automation. As such it is an important component in many test set-ups for GUI or acceptance tests. It’s current version is 2, Selenium Webdriver. But many people still use version 1. Unfortunately the Selenium...

20.9.2011 | 2 minutes reading time

Stephan Kepser

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

The Ultimate Tool for Engineers and Developers: Compass Premium

It’s not an every day activity that a tool comes and redefines how engineering and development teams operate, but Compass is the tool with a game-changing solution. As Atlassian's out-of-the-box internal developer platform, Compass helps teams to stay...

Atlassian
Cloud

3.12.2024 | 4 [Missing String "readingTime"]

Özge Kavas

Living on the edge: building serverless applications with Cloudflare Workers

Cloudflare is best known for its CDN, DNS server (1.1.1.1) or WAF/DDos mitigation services. These services are highly predicated on “Edge Computing”, bringing data closer to the user interested in those services – a user in Australia will be happier ...

Cloud native
Cloud
Serverless

28.11.2024 | 14 [Missing String "readingTime"]

We deployed our SaaS Application on fly.io (and it was great).

How we deployed our application in a fraction of the time while saving 100% of the cost. Our team, a bunch of experienced software engineers without prior contact to cloud deployments, wanted to deploy our OCPP-compliant EV Charging Station Simulator...

AWS
Cloud

23.10.2024 | 4 [Missing String "readingTime"]

Jannis Mainczyk

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 [Missing String "readingTime"]

Markus Höfer

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 [Missing String "readingTime"]

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

In a recent blog post, we explored how Dapr works and how to test it on a simple local Kubernetes cluster. One of Dapr's key advantages is its component system, which enhances portability. In this post, we'll take our previously daperized demo app and...

Software development
Cloud
Azure
Cloud native

22.7.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 [Missing String "readingTime"]

Manuel Zapf

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...

Ever wondered how the technology that seamlessly manages microservices traffic evolved from early implementations to lean, kernel-level solutions? Let's dive into the fascinating journey of service meshes, from Linkerd 1.x to the cutting-edge technologies...

Cloud
Networking
Infrastructure
Kubernetes
Linux

22.5.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

When Gateway API debuted in October last year, this concluded a nearly four-year-long process that started in summer 2019. Gateway API is the successor of core Ingress definition, aiming towards various goals. This blog post will give a brief overview...

API
Open Source
Cloud
Networking
Kubernetes
Cloud native

15.3.2024 | 6 [Missing String "readingTime"]

Manuel Zapf

Cloud-native (application) networking in 2024

It's 2024 and Software is still eating the world. Whether it's powering an e-commerce platform, driving AI applications, or supporting critical business processes within organizations, there's a high likelihood that these applications are running in ...

Cloud
Networking
Infrastructure
Kubernetes

8.3.2024 | 2 [Missing String "readingTime"]

Manuel Zapf

Charge your APIs Volume 22: Mastering the Art of API Federation

API Federation is becoming essential in modern API management, addressing the complexities of evolving digital enterprises. It marks a shift from centralised, monolithic management to a dynamic, modular framework. Unlike traditional methods, API Federation...

API
Cloud
Cloud native

7.2.2024 | 11 [Missing String "readingTime"]

Daniel Kocot

How to upgrade your Aurora Serverless database schema using CDK and Lambda

Imagine the following situation: You are building a serverless application using e.g. lambdas, you setup your system using CDK (or CloudFormation) and you store your data in Aurora Serverless. How would you automate your database schema adaptations or...

Cloud
Database
AWS
Infrastructure as Code
Serverless

16.1.2023 | 12 [Missing String "readingTime"]

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Heroku is cancelling their free plan! What about all my open-source projects? Luckily fly.io comes to the rescue! Here are the missing docs on how to run Spring Boot on fly.io.Why I love(d) HerokuHeroku was my go-to PaaS for open-source projects for ...

CI/CD
Java
Cloud
DevOps
Spring

18.9.2022 | 17 [Missing String "readingTime"]

CloudWatch on AWS: How to tackle high-security requirements

If you build cloud-native applications, you will also generate log output. Log outputs are essential to log the functionality of the application and to be able to localize errors very quickly in the event of a crash. However, log outputs of any kind ...

AWS
Cloud
IT-Security

23.8.2022 | 15 [Missing String "readingTime"]

Jörg Riegel

Tame the multi-cloud beast with Crossplane: Let’s start with AWS S3

What if learning the Kubernetes API is all you need to provision any infrastructure? And we’re not only talking about AWS, Azure & Google – but also IONOS, DigitalOcean and even vSphere. Let’s have a look at Crossplane and how we can create an S3 Bucket...

AWS
CI/CD
Cloud
DevOps

3.7.2022 | 21 [Missing String "readingTime"]

Building an instant noodles DevOps starter pack with Terraform and AWS

How can we help a fictitious startup kickstart its software development process? Using Terraform and AWS services, we’ll build an IT infrastructure that is ready within minutes and ticks quite a few boxes on the technical DevOps capabilities list. Just...

Cloud
Infrastructure
AWS
CI/CD
DevOps

27.6.2022 | 21 [Missing String "readingTime"]

Development Containers & GitHub Codespaces kill the “works on my machine...

We love them, and hate them at the same time: local development environments. But what if we could use remote development techniques like Development Containers or GitHub Codespaces to finally overcome the “works on my machine” problem? And also end ...

DevOps
CI/CD
Cloud
Container

12.6.2022 | 15 [Missing String "readingTime"]

Rebooting Accelerate, part 2: How to deliver value faster

So we want to deliver value faster, but how do we do it? The good news is that there are lots of ways to achieve it. The bad news is that it’s hard to pick the right means. What capabilities and approaches are the ones that matter to us as tech people...

Cloud
DevOps

6.6.2022 | 13 [Missing String "readingTime"]

Secretless connections from GitHub Actions to AWS using OIDC

Imagine the following scenario: You set up your GitHub Actions in your repository. And it’s all cool until you want to access your cloud provider resources. Now you might be tempted to create an access key and secret access key, place it as a secret ...

Azure
Cloud
AWS
CI/CD
DevOps
GitHub

29.5.2022 | 8 [Missing String "readingTime"]

Manuel

GitLab security scanning – part 3: Kubernetes deployments

In part 1 and part 2 , we focused on different types of security scanning practices. In this article we will take a look at Kubernetes deployments with Helm and Helmfile. In particular, we are interested in how to ensure that objects deployed to Kubernetes...

DevOps
IT-Security
CI/CD
GitLab
Cloud
Kubernetes

15.5.2022 | 4 [Missing String "readingTime"]

Sven Hertzberg

Persistence without Persistence

Was this post helpful?

Blog author

More articles

Selenium WebDriver for Safari 8

German Data Protection Legislation and the USA PATRIOT Act

Selenium 1 Remote Control Plugin for Firefox 5 and 6

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

The Ultimate Tool for Engineers and Developers: Compass Premium

Living on the edge: building serverless applications with Cloudflare Workers

We deployed our SaaS Application on fly.io (and it was great).

Dangling DNS in cloud infrastructures

Spring Boot and HTMX: Deployment to AWS Lambda

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...

Demystifying the Kubernetes Gateway API: What the heck is it and why should...

Cloud-native (application) networking in 2024

Charge your APIs Volume 22: Mastering the Art of API Federation

How to upgrade your Aurora Serverless database schema using CDK and Lambda

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

CloudWatch on AWS: How to tackle high-security requirements

Tame the multi-cloud beast with Crossplane: Let’s start with AWS S3

Building an instant noodles DevOps starter pack with Terraform and AWS

Development Containers & GitHub Codespaces kill the “works on my machine...

Rebooting Accelerate, part 2: How to deliver value faster

Secretless connections from GitHub Actions to AWS using OIDC

GitLab security scanning – part 3: Kubernetes deployments