Ceph Object Storage as fast as it gets or Benchmarking Ceph

3.3.2014 | 9 minutes reading time

CenterDevice is a distributed document management and sharing software without any single centralized component. In our next evolution we are going to use the distributed object store Ceph for storing our encrypted documents. In this article, my colleague Daniel Schneller and I present benchmarks for CenterDevice’s current Ceph installation. As it turns out, Ceph is as fast as it gets. In fact, Ceph gives you a reliable, cheap, and performant alternative to commercial SANs and distributed file storage.

Setup

We use four Dell PowerEdge 510 servers with 128 GB RAM and 14×4 TB disks — two mirrored disks for the OS and 12 disks for Ceph storage. The servers are connected via 4×1 Gbit network interfaces with two redundant switches. Two network interfaces each are bonded to form a high availability link according to IEEE 802.3ad. This way, a link or switch can fail while connectivity continues over the other link. Furthermore, using layer 3+4 link aggregation via LACP, different TCP streams may use both links, enhancing network throughput.

Before measuring Ceph’s Object Store performance, we establish a baseline for the expected maximum performance by measuring the performance of the disks and the network.

Baseline – Disk

For the disk performance baseline, we proceed in two steps. First, we measure the performance of a single disk. Second, we determine the performance of the whole disk subsystem of one server by stressing all disks at once.

To determine the write speed, we use dd to write arbitrary data as fast as possible. It’s important to circumvent the OS’ disk cache to get realistic results (oflag). We write a 10 GB file by reading from /dev/zero:
> dd if=/dev/zero of=/var/lib/ceph/osd/ceph-10/deleteme bs=10G count=1 oflag=direct

For the second step, we start the same dd process for all 12 data disks simultaneously — every dd is put in the background to run in parallel.
> for i in `mount | grep osd | awk '{print $3}'`; do dd if=/dev/zero of=$i/deleteme bs=10G count=1 oflag=direct &; done

For determining read speed, we read the created files; again, bypassing the cache (ifloag), reading from one and then from all disks:
> time dd if=/var/lib/ceph/osd/ceph-10/deleteme of=/dev/null bs=10G count=1 iflag=direct
> for i in `mount | grep osd | awk '{print $3}'`; do dd if=$i/deleteme of=/dev/null bs=10G count=1 iflag=direct &; done

The results are shown in figure 1. As you can see, a single disk can write up to 154 MB/s and read up to 222 MB/s. If all disks are used at the same time, the write speed decreases to an average of 93 MB/s and the read speed to 142 MB/s respectively which are pretty good results.

Figure 1 — Write performance for 1 and 12 disks in parallel.

Baseline – Network

Ceph suggests that there should be two separate designated networks: a cluster and a public network. Ceph uses the cluster network for internal synchronization as well as replication while the public network is handling client requests. We use four physical links, bonded together in pairs to form one link for each of the two networks. See the configuration for a bonded interface on Ubuntu below — we use a little trick to set the VLAN specific interface name to a human readable string instead of something like bond0.105. This makes the configuration less error prone.

> cat /etc/network/interfaces
...
auto bond2
iface bond2 inet manual
bond-slaves p2p3 p2p4 # interface to bond
bond-mode 802.3ad # activate LACP
bond-miimon 100 # monitor link health
bond-xmit_hash_policy layer3+4 # use Layer 3+4 for link selection
pre-up ip link set dev bond2 mtu 9000 # set Jumbo Frames

auto vlan-ceph-clust
iface vlan-ceph-clust inet static
pre-up ip link add link bond2 name vlan-ceph-clust type vlan id 105 # Little trick
  # to set human readable interface name
pre-up ip link set dev vlan-ceph-clust mtu 9000 # Jumbo Frames
post-down ip link delete vlan-ceph-clust # unset human readable interface name
address 10.102.5.12 # IP config
netmask 255.255.255.0
network 10.102.5.0
broadcast 10.102.5.255
...

The theoretical maximum performance for the bonded network interfaces is 2 Gbit/s = 250 MB/s. iperf is an excellent tool to measure network throughput which we use for this measurement. We start an iperf server process on node01 and two iperf clients sending via two TCP streams each on node02 and node03:

[node02] > iperf -c node01.ceph-cluster -P 2
[node03] > iperf -c node01.ceph-cluster -P 2
[node01] > iperf -s -B node01.ceph-cluster
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address node01.ceph-cluster
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49412
[ 5] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49413
[ 6] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59947
[ 7] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59946
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 342 MBytes 286 Mbits/sec
[ 5] 0.0-10.0 sec 271 MBytes 227 Mbits/sec
[SUM] 0.0-10.0 sec 613 MBytes 513 Mbits/sec
[ 6] 0.0-10.0 sec 293 MBytes 246 Mbits/sec
[ 7] 0.0-10.0 sec 338 MBytes 283 Mbits/sec
[SUM] 0.0-10.0 sec 631 MBytes Mbits/sec

The results are rather disappointing as we achieve only roughly 513 Mbit/s + 529 Mbit/s or about 130 MB/s. What’s wrong here?

IEEE 802.3ad using layer 3 and 4

IEEE 802.3ad uses the TCP connection 4-tuple (Src IP, Src Port, Dst IP, Dst Port) to determine the physical link of a bonded link to send data over. 1 Yes, send. According to the standard, the sender decides which physical link to use. This means that our two senders (node02 and node03) select a sending link and start to transmit data as fast as possible. But the switch connecting the senders to the receiver (node01) does not know about that. All it sees are layer 2 frames designated for a specific MAC address to which it forwards the frames. So while the two senders fully utilize a 1 GBit/s link, the switch only forwards to the receiver via one single physical link which results in the observed throughput. 2

Baseline – Network: done right

If we switch the experiment’s parameters to one sender and two receivers, we get the expected results. This time, node01 sends node02 and node03 simultaneously via two separate processes. We simply use netcat to send and pv to display the throughput.

[node02] > netcat -l -p 5001 > /dev/null
[node03] > netcat -l -p 5001 > /dev/null
[node01] > dd if=/dev/zero | pv | netcat node02.ceph-cluster 5001 &
173GB 0:25:36 [ 115MB/s]
[node01] > dd if=/dev/zero | pv | netcat node03.ceph-cluster 5001 &
173GB 0:25:36 [ 115MB/s]

Here we go. Now we see interface bonding in action — cf. figure 2.

Figure 2 — Network performance for 1 and 2 streams over a bonded network interface.

This means that the maximum transmission speed is around 230 MB/s — depending on the generated load and its processes. The maximum write speed for a single disk is around 150 MB/s and for all disks used at the same time around 90 MB/s. Reading speeds are 220 MB/s and 140 MB/s respectively.

Ceph Object Store

Ceph is a Reliable Autonomic Distributed Object Store (RADOS) that does not have a single point of failure as there is no central component, making it a perfect fit for CenterDevice’s architecture. In contrast to other distributed stores, Ceph uses an algorithm-only method to locate and store an object. This means that every client only needs to apply the CRUSH (Controlled, Scalable, Decentralized Placement of Replicated Data) algorithm to compute the corresponding object disk storage daemon (OSD) that is responsible for storing a particular object. Since different objects are stored by different OSDs, a client sending multiple objects talks to multiple nodes which allows it to make use of bonded links as described above.

Ceph has an integrated benchmark program which we use for measuring the object store performance. The corresponding command is rados bench. In general, this benchmark writes objects as fast as possible to a Ceph cluster and reads them back sequentially afterwards — see rados bench –help for details. 3 Again, we proceed in two steps. First, we run only one load generator and second, we run 4 load generators on each Ceph node. It is important to note that all objects are replicated for data safety once in the system, leading to load on both the public as well as the cluster network.

[node01] > rados bench -p data 300 write --no-cleanup && rados bench -p data 300 seq # --no-cleanup leaves written data in ceph for read benchmark

We repeat this command in parallel on all four nodes. Figure 3 presents the results for both runs.

Figure 3 — Ceph read and write performance for 1 and 4 load generators.

The result for the write speed of one load generator is quite surprising. It’s almost the perfect theoretical network throughput of 245 MB/s. There are two explanation for this observation. Since all node have 128 GB RAM, there is plenty of room to buffer data before actually writing them to the disks. In this way, the network and disk IO is decoupled. In addition, the benchmark is running on all Ceph worker nodes. Since we have 4 nodes, 1/4 of the data to write is delivered locally. Consistently, write speed drops significantly when load is generated by all 4 nodes. Interestingly, the average write performance is still greater than an individual disk even though every object is replicated once which doubles the load in the cluster network. The big surprise is read speed though which remains constant and independent from the work load.

When we compare these results with our baseline, we realize that even in high load situations (4 load generators for 4 worker nodes), the write and read speed is higher than a single local disk which is quite impressive taking into account that all writes were duplicated for data safety.

Ceph Performance Conclusion

Our internally run tests consist of many more scenarios than presented in this blog post. During all our tests, Ceph did its job without any quirks. The performance remains very stable even in high load situations and independently of the amount of data stored. In our opinion, Ceph is an excellent choice to store large amounts of data outperforming our former solution. Ceph gives you a reliable, cheap, and performant alternative to commercial SANs and distributed file storage.

We also have to say that the observed performance is the result of a carefully tweaked hardware and software configuration. So while Ceph is excellent choice for distributed object and file storage, the installation and configuration requires a high amount of Unix, networking, and Ceph internals knowledge.

There’s more

Ceph offers way more than the described object storage. It allows you to introduce multiple redundancies — we are going to use three times redundancy in production — and snapshotting for versioning objects. On top of the object store, Ceph has a Amazon S3 and OpenStack Swift compatible REST Gateway (RADOSGW) and even a file system layer called CephFS. We benchmarked these layers, too, but omitted them here for brevity.

Feel free to contact us if you have questions and comments.

Footnotes

1. If frames of a single TCP stream are sent over both links, out of order reception may occur. In such a scenario, TCP would assume packet loss due to duplicate ACKs and eventually significantly reduce the sending speed. Therefore, packets of one TCP stream always travel over the same wire.↵
2. There are actually frames forwarded using the second link as the average throughput is a little bit higher than the maximum single link speed of 125 MB/s.↵
3. Even though listed in the help output, random reads are currently not implemented.↵

Was this post helpful?

Blog author

Lukas Pustina

Do you still have questions? Just send me a message.

How-To: Seamless development in WSL2 with git, SSH and podman desktop

Weather you want a more uniform development environment across your team to avoid compatibility issues between different operating systems, want to work closer to your target environment, or need to run a linux exclusive tool like Claude Code, an AI ...

Git
Microsoft
Software development

5.1.2026 | 5 minutes reading time

Full control despite virus protection and modern systems – How to truly...

Recently, codecentric's security experts were tasked with testing the IT infrastructure security of a company with several hundred employees. The clients believed they were secure: The systems were running on the latest version of Windows 11 and Windows...

IT-Security
Infrastructure

2.7.2025 | 6 minutes reading time

Serverless from Europe: My Experience with Scaleway as an Alternative ...

In addition to dominant US providers like AWS, Azure, and GCP, the French company Scaleway now offers a comprehensive serverless computing portfolio. This includes services for Function as a Service, a lightweight Key/Value Store, and a simple messaging...

Compliance
Infrastructure
data protection
Cloud native
Cloud
Infrastructure as Code

28.5.2025 | 5 minutes reading time

Florian Lüdiger

20 years of coding

We all grow older. It is simply inevitable. As the saying goes, The only way to not grow old is to die young. Recently, I've completed my 20th year in the development industry. Through academia, consulting, and a stint in product development, I've learned...

Software development
Training
Culture

11.4.2025 | 10 minutes reading time

Elisabeth Schulz

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 minutes reading time

Danny Keller

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 minutes reading time

Danny Keller

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 minutes reading time

Markus Höfer

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

API gateways are essential for managing and securing data flow between services. As software architectures evolve, different types of API gateways have emerged to address specific challenges: Legacy, Agnostic, and Kubernetes-native. Drawing on insights...

API
Software architecture
Infrastructure
Integration

21.8.2024 | 12 minutes reading time

Daniel Kocot

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

In a recent blog post, we explored how Dapr works and how to test it on a simple local Kubernetes cluster. One of Dapr's key advantages is its component system, which enhances portability. In this post, we'll take our previously daperized demo app and...

Software development
Cloud
Azure
Cloud native

22.7.2024 | 10 minutes reading time

Manuel Zapf

React is dead, long live React - React 19 is here

The world of frontend development has changed once again, and this time React 19 is leading the way. This version brings a variety of new features and improvements, but the most exciting innovation is the brand new compiler, which already requires React...

React
Frontend
Software development
JavaScript
Webdevelopment

19.7.2024 | 6 minutes reading time

Michel Ehmen

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 minutes reading time

Manuel Zapf

Spring Boot and HTMX: The boring app

Motivation Most apps I touched in the wild follow the same two tiered approach. A backend delivering JSON (some may call this REST) and a frontend framework, consuming JSON from the backend converting it to the HTML displayed to the user. Worst case,...

Software architecture
Software development
Spring
Kotlin

28.6.2024 | 16 minutes reading time

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...

Ever wondered how the technology that seamlessly manages microservices traffic evolved from early implementations to lean, kernel-level solutions? Let's dive into the fascinating journey of service meshes, from Linkerd 1.x to the cutting-edge technologies...

Cloud
Networking
Infrastructure
Kubernetes
Linux

22.5.2024 | 10 minutes reading time

Manuel Zapf

Charge your APIs Volume 25: Contract Testing

I feel the way we do integration testing is sort of like setting your house on fire to test your smoke alarm. It is excessive, tiresome and way too costly. This is not a quote from myself. I typically don't come up with such good ideas when I need....

Testing
Software development
API

2.4.2024 | 11 minutes reading time

Pasquale Brunelli

Cloud-native (application) networking in 2024

It's 2024 and Software is still eating the world. Whether it's powering an e-commerce platform, driving AI applications, or supporting critical business processes within organizations, there's a high likelihood that these applications are running in ...

Cloud
Networking
Infrastructure
Kubernetes

8.3.2024 | 2 minutes reading time

Manuel Zapf

How to gain visibility as a software developer?

No matter if junior, medior or senior, introverted or extroverted: Every software developer can increase their visibility with different tools and should treat the topic as important. The only question is: how and with what effort? In this blog post,...

Training
Software development
Community
Open Source

21.2.2024 | 6 minutes reading time

The best of both worlds: Harnessing the benefits of object-oriented and...

Functional programming and OOP are often viewed as two separate paradigms in programming. And it is true that programming languages lean more towards one or the other, which influences how we are "supposed to" solve a problem in this language. In this...

Pattern
Functional programming
Software development

1.2.2023 | 8 minutes reading time

Thomas Buß

Secure your Kubernetes workloads with OPA Gatekeeper

Last month, Kubernetes 1.25 was released. And with that, the long-announced removal of PodSecurityPolicies (short: PSPs) finally becomes reality. Finally? Yes – as Tabitha Sable from the Kubernetes SIG Security Team said herself in the linked blog post...

IT-Security
Kubernetes
Infrastructure

15.12.2022 | 8 minutes reading time

Introduction to GitOps with ArgoCD

In this post you will learn what GitOps is about and see the steps to create a setup on your laptop to gain some experience with ArgoCD. Using an industry standard container orchestrator such as Kubernetes, this enables developers to continuously deploy...

CI/CD
Kubernetes
GitHub
Open Source
DevOps
Container
Infrastructure as Code
Infrastructure
Spring

31.10.2022 | 10 minutes reading time

API consumers – between search and feedback

Approaches for API consumers“We do know our consumers. We know exactly what they want.” Very often I hear these two sentences at the beginning or in the middle of projects. But who is a consumer of an API or a digital product in the fist place?This is...

API
Software development

19.9.2022 | 9 minutes reading time

Daniel Kocot

Ceph Object Storage as fast as it gets or Benchmarking Ceph

Footnotes

Was this post helpful?

Blog author

More articles in this subject area

How-To: Seamless development in WSL2 with git, SSH and podman desktop

Full control despite virus protection and modern systems – How to truly...

Serverless from Europe: My Experience with Scaleway as an Alternative ...

20 years of coding

Hexagonal Architecture is just an island

ArchUnit in practice: Keep your Architecture Clean

Dangling DNS in cloud infrastructures

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

React is dead, long live React - React 19 is here

Exploring Dapr: A Deep Dive into Distributed Application Runtime

Spring Boot and HTMX: The boring app

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...

Charge your APIs Volume 25: Contract Testing

Cloud-native (application) networking in 2024

How to gain visibility as a software developer?

The best of both worlds: Harnessing the benefits of object-oriented and...

Secure your Kubernetes workloads with OPA Gatekeeper

Introduction to GitOps with ArgoCD

API consumers – between search and feedback