Transparent End-to-End security for Apache Kafka – Part 1

10.10.2016 | 9 minutes reading time

Apache Kafka comes with a lot of security features out of the box (at least since version 0.9). But one feature is missing if you deal with sensitive mission critical data: Encryption of the data itself.

Sure, this could simply be accomplished by encrypting the disks on which the Kafka brokers store their data. But the unencrypted form may still reside in memory or other caches and, even worse, anyone who can access the appropriate message can just read it. So it seems that disk encryption solely is not enough for sensitive data. In addition to “just encrypt the data” we need a mechanism to make sure no one altered the data after the producer sent the message. Keep in mind that Kafka, which is a distributed system, have to deal with insecure networks and SSL/TLS may not protect your data in every case under all circumstances (weak cipher suites, man in the middle attacks and openssl bugs, etc.). Furthermore with SSL/TLS enabled Kafka cannot leverage the sendfile syscall anymore (which writes a pagecache directly to a socket).

To achieve all these requirements the producer has to encrypt the messages before pushing them over the wire into Kafka and the Consumer needs to decrypt them upon retrieval. So both endpoints of the communication link need to handle the security aspects which is called End-to-End security (or sometimes also End-to-End encryption). The advantage of this approach is that it is totally transparent for Kafka and all other involved components between Producer and Consumer.

Encryption algorithm for Kafka

Because Kafka is a high volume and low latency message broker we need a fast (but still secure) encryption algorithm which can encrypt an arbitrary amount of data. The obvious choice here is AES (Advanced Encryption Standard) mainly because of the widespread and common hardware support which is available. Modern Intel and AMD processors support AES en-/decryption natively within the CPU which is a lot faster than AES software implementations. The problem with AES in our context is, that it is a symmetric cipher which means that there is only one key which is used for encryption as well as decryption. This leads to the question how a secure key exchange between producer and consumer can be accomplished. The short answer is: it is not possible if you not already have a secure channel (and having that would make establishing another one maybe unnecessary).

So we need another mechanism to conduct secure key exchange. Luckily this challenge is already solved leveraging asymmetric cryptography whereas encryption and decryption is performed with different keys (they belong together and are often referred to as key pair). One key is the so called public key which is used for encryption and the other one is called private key and can decrypt messages encrypted with the corresponding public key. The private key cannot be derived from the public key so it is safe to transfer the public key over an insecure channel.

One could now ask why do we not just use an asymmetric cipher (instead of AES) and we are done. The reason is that asymmetric ciphers are much slower than symmetric ones and they cannot encrypt data bigger than the cipher’s key size. In our case (let’s choose RSA for now as our asymmetric cryptosystem) the typical key length for RSA considered secure is nowadays 4096 bit (=512 byte) but for practical reasons we go with 2048 bits (=256 byte) here. To use RSA as the only one algorithm we would need to chunk our data in pieces of 256 bytes length. That sounds not very practical. But RSA is well suited to encrypt our AES key (which is either 16 or 32 bytes in length). So let’s do this.

Now we need to make the usage of AES and RSA encryption semantically secure . That means that encryption of slightly different messages (which have some identical content) with the same key does not contain informations which allows to derive the key from that. To circumvent this for AES encrypted messages you either have to generate a new random key for every message or to use a the same key but choose a AES mode which support initialization vectors (IV) and use a different IV for every message. For RSA we need to apply a random encryption padding schemes such as Optimal Asymmetric Encryption Padding (OAEP) .

As mentioned above RSA en/decryption is pretty slow so we need to avoid it as much as possible. That is why it makes sense to use the same AES key for encrypting more than one message and use an initialization vector to ensure semantically secureness. On the decryption side we need to somehow cache the decrypted AES key. To know which AES key is used for the particular message without submitting it in plaintext (or decrypt it constantly) it is necessary to add a key hash which serves as caching id. To prevent hash collision attacks the hash needs to be cryptographically secure. For now let us use SHA-256 for that.

As a last piece it would be nice if the consumer can detect if a particular message is encrypted or not. In the latter case the decryption is simply skipped. This can be achieved by introducing magic bytes to label encrypted messages.

With that background wiring all this together leads to the following high level message processing chain:

O: Original plain message (arbitrary bytes)
K: Plain AES key
M: Magic bytes (0xDF 0xBB)
hash(K): SHA-256 hash of plain AES key
rsa(K): RSA encrypted plain AES key
aes(O): AES encrypted message
IV: Initialization Vector
L: Length information about hash(K), rsa(K) and IV

Here in this case we use “AES/CBC/PKCS5Padding” for AES en-/decryption and “RSA/ECB/OAEPWithSHA-256AndMGF1Padding” for RSA en-/decryption

Producer:
1) If no AES key exists create a random one → (K)
1. Encrypt AES key with RSA public key → rsa(K)
2. Calculate SHA-256 hash of AES key → hash(K)
2) Generate random initialization vector → IV
3) Encrypt message with AES key and I -> aes(O)
4) Replace original message O with M-L-hash(K)-rsa(K)-I-aes(O)

Consumer:
1) Check magic bytes (M). Bypass unencrypted messages
2) Extract hash(K) by looking at L
3) Extract IV by looking at L
4) If hash(K) is in cache get plain AES key (K)
5) If hash(K) is no in cache get decrypt rsa(K) to get plain AES key (and put them into the cache)
6) Decrypt aes(O) with K and IV
7) Replace M-L-hash(K)-rsa(K)-IV-aes(O) with O

So rsa(K) is only necessary when a new AES key is used (by producer) and/or the cache needs to be populated (by consumer). The drawback with caching the AES key is that it resides permanently in producer and consumer JVM memory. But without caching especially the decrypting process is too slow to be useful (see benchmarks).

Byte sequence on an encrypted message

This adds a constant overhead of 309 bytes to each message. 5 bytes for the header, 32 bytes for SHA-256 hash, 256 bytes for the RSA encrypted AES key and 16 byte for the IV. The encrypted message size may also be 15 bytes bigger than the original message due to PKCS5Padding which is used (blocksize 16). So the maximum overhead in total would be 324 byte. That may be a lot especially if you only handle small messages you easily double your data size. There is nothing we can do here but use a weaker RSA key with only 1024 bit key size. This would reduce the maximum overhead by 128 byte to 196 byte. But 1024 bit keys are considered vulnerable. A weaker hash algorithm producing a shorter hash is also not an option. And finally we cannot omit the IV nor put them into the RSA encrypted part (because that would make caching impossible). So increased message size as well as the additional CPU cycles for AES en-/decryption are the costs for security.

But that’s enough theory, let’s look at the implementation:

The basic idea for transparent end-to-end encryption in Kafka is to write a Serializer and Deserializer which wraps the original Serializer and Deserializer and adds the en-/decryption processing transparently.

Implementing a delegating Serializer and Deserializer is easy. We just need to implement two classes for that:

org.apache.kafka.common.serialization.Deserializer<T>;
public byte[] deserialize(String topic, T data) {
    return originalDeserializer.deserialize(topic, decrpyt(data));
}

org.apache.kafka.common.serialization.Serializer<T>;
public byte[] serialize(String topic, T data) {
   return originalSerializer.serialize(topic, encrypt(data);
}

The encryption and decryption is literally done with the Java implementation which utilizes AES-NI instructions (if CPU supports it) and has also the advantages in having no external dependencies.

En/Decryption of a byte array is pretty simple:

1) Cipher cipher = Cipher.getInstance(algo)
2) cipher.init(mode, key, [IV])
3) byte[] result = c.doFinal(input)

algo: Algorithm (here “RSA” or “AES”) and padding scheme
mode: encrypt or decrypt
key: The key
IV: Initialization vector (AES only)
input: The input bytes (plaintext or encrypted text), depends on mode
output: The output (encrypted text or plain text), depends on mode

Note: Instances of Cipher are not threadsafe, so it’s best used encapsulated in a threadlocal

Use it

1) Include the library via maven

<dependency>
 <groupId>de.saly</groupId>
 <artifactId>kafka-end-2-end-encryption</artifactId>
 <version>1.0.1</version>
</dependency>

or download it from https://github.com/salyh/kafka-end-2-end-encryption/releases/tag/v1.0.1

2) Generate a RSA keypair:

java -cp kafka-end-2-end-encryption-1.0.1.jar de.saly.kafka.crypto.RsaKeyGen 2048

3) Configure your producer and consumer:

Producer:

value.serializer: de.saly.kafka.crypto.EncryptingSerializer 
crypto.wrapped_serializer: org.apache.kafka.common.serialization.StringSerializer 
crypto.rsa.publickey.filepath: /opt/rsa_publickey.key

Consumer:

value.deserializer: de.saly.kafka.crypto.DecryptingDeserializer
crypto.wrapped_deserializer: org.apache.kafka.common.serialization.StringDeserializer
crypto.rsa.privatekey.filepath: /opt/rsa_privatekey.key

Benchmarks

A recent MacBookPro with Java 8 can encrypt approx. 300 mb/s in average and decrypt approx. 1350 mb/sec in average (per Thread)

Limitations

The design of End-to-End security discussed in this article does have some limitations. It provides currently only encryption and no kind accountability or non repudiation (because message are not signed yet). Authentication and authorization is also not covered but can be leveraged by using Kafka’s own mechanisms . It does also not protect against one sitting in den middle (Man in the middle) from dropping, replaying or reordering messages. There is also no forward secrecy present. We will discuss and add some of this features in the part 2 of this article.

Conclusion

This article and the provided implementation demonstrates how transparent end-to-end security can be applied to Kafka and add a enterprise grade security feature. We discussed the nature of symmetric and asymmetric encryption systems, how they can be combined and how much overhead they added. In Part 2 we will discuss optimizations (like batching and compression of messages) and adding cryptographic signatures to accomplish a trusted relationship between various producers and consumers. In Part 3 we will have a look on how non-Java producers and consumers can be made ready for end-to-end security.

Download

https://github.com/salyh/kafka-end-2-end-encryption
https://github.com/salyh/kafka-end-2-end-encryption-bench-it

This article as well as the implementation was inspired by http://www.symantec.com/connect/blogs/end-end-encryption-though-kafka-our-proof-concept (credits to Jim Hoagland).

Apache, Apache Kafka, Kafka, and associated open source project names are trademarks of the Apache Software Foundation

Was this post helpful?

Blog author

Hendrik Saly

Do you still have questions? Just send me a message.

DeepFake: Detect AI-Generated Images in 5 Steps

We live in a time when an image is no longer a reliable guarantee of truth. AI‑generated content floods social media feeds, news platforms and messenger groups every single day, and only very few people are able to tell the difference. What once required...

IT-Security
AI
Generative AI
Search
Google
data protection
Digitalization

16.3.2026 | 5 minutes reading time

Why Scanners Fail in Practice: Lessons from the Shai-Hulud Attacks on ...

2025 marks the year supply chain security stopped being a theoretical risk and became a practical nightmare for anyone managing a package.json file. The recent attack waves on the NPM ecosystem demonstrated this vividly, turning trusted libraries into...

IT-Security
DevSecOps
CI/CD
JavaScript

9.12.2025 | 9 minutes reading time

Felix Dreißig

Where Vibe Coding helps—and where it doesn't: A field report

Vibe Coding is a programming approach that delegates virtually every task involved in working with source code—from understanding to creation to modification—to a GenAI, placing almost complete trust in the output of these kinds of AI. Based on a recent...

Generative AI
Software Modernization
IT-Security

20.10.2025 | 10 minutes reading time

Patrick Krings

Dr. Florian Rademacher

Full control despite virus protection and modern systems – How to truly...

Recently, codecentric's security experts were tasked with testing the IT infrastructure security of a company with several hundred employees. The clients believed they were secure: The systems were running on the latest version of Windows 11 and Windows...

IT-Security
Infrastructure

2.7.2025 | 6 minutes reading time

How to Catch the Good Guys: My Learnings on Recruiting IT Security Professionals...

In 2024, I embarked on the journey to become a recruiter for an IT Security Consulting team. I thought, “How hard can it be?” I had already been a recruiter for over 10 years, focusing predominantly on software developers, and I imagined my new task ...

IT-Security
HR

13.6.2025 | 4 minutes reading time

Christine Seagar

Relative path DLL hijacking in Windows programs

As part of a Red Team assessment, a challenge arose to execute our own code via a DLL. The reason for this scenario was the use of Application Allow Listing software, which blocks the execution of unknown executables. The usual options for loading DLLs...

IT-Security

24.3.2025 | 4 minutes reading time

Timo Sablowski

Self-issued JWT for mobile client authentication

Overview Mobile applications frequently authenticate their backend calls via JWT. These tokens are frequently used in conjunction with OIDC to authenticate a user. Sometimes, particularly in high-assurance scenarios, it can be preferable to authenticate...

IT-Security
Mobile
Rust
Kotlin
Android

4.2.2025 | 8 minutes reading time

Elisabeth Schulz

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 minutes reading time

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 minutes reading time

Markus Höfer

Zero Trust Azure Identity & Access Architecture

Falko Lehmann and Hendrik Kamp have already explained in their blog post on Zero-trust Architecture why zero-trust security models are preferable to traditional perimeter security models in order to minimize damage from cyber attacks. Falko and Hendrik...

IT-Security
IAM
Azure
Software architecture

4.6.2024 | 14 minutes reading time

Zero-trust architecture – Why we need to end perimeter-based security

Introduction This article will help you understand the importance of zero-trust architecture and why it is the state of the art to protect your organization from cyberattacks. We see it as fundamental knowledge for solution and system architects to consider...

IT-Security
Networking

29.9.2023 | 9 minutes reading time

Hendrik Kamp

Fighting Gandalf with magic spells (the spells are prompt injections) ...

Note: Do not attack any systems for which you do not have explicit permission to do so. In this article, I will recount the tale of outwitting a large language model by performing prompt injection attacks. Before we start, let's establish a common baseline...

IT-Security
AI

10.7.2023 | 12 minutes reading time

Michael Wagner

Secure your Kubernetes workloads with OPA Gatekeeper

Last month, Kubernetes 1.25 was released. And with that, the long-announced removal of PodSecurityPolicies (short: PSPs) finally becomes reality. Finally? Yes – as Tabitha Sable from the Kubernetes SIG Security Team said herself in the linked blog post...

IT-Security
Kubernetes
Infrastructure

15.12.2022 | 8 minutes reading time

My Keycloak learning journey

Keycloak is an open-source identity provider. You can add authentication to applications and secure services with minimum effort. No need to deal with storing users or authenticating users. Keycloak provides user federation, strong authentication, user...

Keycloak
IT-Security

22.11.2022 | 8 minutes reading time

Open Policy Agent – Primer

The Open Policy Agent (OPA) is a general-purpose, open-source policy engine, i.e. a collection of components that allows for a uniform and efficient implementation of rules of all kinds. This article shows a small practical example. When was the last...

CI/CD
Software architecture
IT-Security

19.10.2022 | 5 minutes reading time

Marco Paga

CloudWatch on AWS: How to tackle high-security requirements

If you build cloud-native applications, you will also generate log output. Log outputs are essential to log the functionality of the application and to be able to localize errors very quickly in the event of a crash. However, log outputs of any kind ...

AWS
Cloud
IT-Security

23.8.2022 | 15 minutes reading time

Jörg Riegel

GitLab security scanning – part 3: Kubernetes deployments

In part 1 and part 2 , we focused on different types of security scanning practices. In this article we will take a look at Kubernetes deployments with Helm and Helmfile. In particular, we are interested in how to ensure that objects deployed to Kubernetes...

DevOps
IT-Security
CI/CD
GitLab
Cloud
Kubernetes

15.5.2022 | 4 minutes reading time

Sven Hertzberg

Keycloak.X, but secure – without vulnerable libraries

TLDR: How to reduce the known CVEs (common vulnerabilities and exposures) to zero by creating your own Keycloak distribution* .IntroductionKeycloak (see website) will become easier and more robust by switching to Quarkus, at least that’s the promise...

Java
IT-Security
Keycloak

9.5.2022 | 11 minutes reading time

GitLab security scanning – part 2

… Containers … applications … licenses … In part 1 of the article series, we focused on static scanning of source code. In this article we will go one step further. First we look at the scanning of (container) images. Then we delve into the topic of...

CI/CD
Git
GitLab
IT-Security

18.4.2022 | 5 minutes reading time

Sven Hertzberg

GitLab security scanning

Secure.Your.Code! …At all stages…Automatically…Always…Starting with the first line of your code… Today, the security scanning of code, containers and applications is at least as important as the functionality of the application itself. It’s vital to ...

CI/CD
Git
GitLab
IT-Security

14.3.2022 | 5 minutes reading time

Sven Hertzberg

Transparent End-to-End security for Apache Kafka – Part 1

Encryption algorithm for Kafka

But that’s enough theory, let’s look at the implementation:

Use it

Benchmarks

Limitations

Conclusion

Download

Was this post helpful?

Blog author

More articles in this subject area

DeepFake: Detect AI-Generated Images in 5 Steps

Why Scanners Fail in Practice: Lessons from the Shai-Hulud Attacks on ...

Where Vibe Coding helps—and where it doesn't: A field report

Full control despite virus protection and modern systems – How to truly...

How to Catch the Good Guys: My Learnings on Recruiting IT Security Professionals...

Relative path DLL hijacking in Windows programs

Self-issued JWT for mobile client authentication

How we can hack an AI with just a few words

Dangling DNS in cloud infrastructures

Zero Trust Azure Identity & Access Architecture

Zero-trust architecture – Why we need to end perimeter-based security

Fighting Gandalf with magic spells (the spells are prompt injections) ...

Secure your Kubernetes workloads with OPA Gatekeeper

My Keycloak learning journey

Open Policy Agent – Primer

CloudWatch on AWS: How to tackle high-security requirements

GitLab security scanning – part 3: Kubernetes deployments

Keycloak.X, but secure – without vulnerable libraries

GitLab security scanning – part 2

GitLab security scanning