This article will help you understand the importance of zero-trust architecture and why it is the state of the art to protect your organization from cyberattacks. We see it as fundamental knowledge for solution and system architects to consider zero-trust architecture in distributed systems.
In the past, companies could rely on firewalls and other perimeter-based security tools to prevent attackers from accessing the internal network and thus reducing the attack surface. However, with the rise of cloud computing, remote work, and the Internet of Things (IoT), the network perimeter has become increasingly porous. The misconception that perimeter-based security can be applied to modern distributed systems makes it easier for attackers to gain access to corporate networks and systems.
Historically, companies fortified their networks using measures like VPNs, firewalls, and ACLs. This is not possible with ubiquitous hyper scalers, managed services, and SaaS offerings. Therefore, trust placed in these traditional tools can now create a false sense of security, potentially leading to major security incidents with significant business impacts. To effectively navigate this changing landscape, it's crucial to explore modern security strategies that align with the evolving nature of technology and remote operations.
In this article, we will discuss the challenges of perimeter-based security and the benefits of zero-trust architecture. The concept of zero-trust architecture was initially conceptualized in the mid-90s and is now in high demand due to requirements from governments and companies in regulated fields. We will also provide a hacker's perspective on traditional and modern security architectures. If you are not interested in the story, simply skip the indented text.
John <gr4v3d1gg3r> Doe finds an old WordPress installation running at Valuable Inc. from the time when the business was starting out. The installation reveals that the company has experimented with a lot of different things in the early days.
John quickly finds a known vulnerability in the old product, which allows him to execute code on the machine. The installation is supposed to be accessible from outside the company's network only, but the PostgreSQL database has been moved to the intranet at some point. The administrator has allowed all connections that exist on the database in the firewall, which means that John can now connect to the PostgreSQL database on the intranet using the WordPress server as a jump host.
Perimeter-based security architecture and challenges
The traditional approach to network security is a DMZ architecture. The network is constructed with a well-defined perimeter that should protect an organization’s assets. Hereby, the internal network is protected so that traffic inside the network can be trusted. The borders of the network are fortified with firewalls, access-control lists and other security measures in order to ensure that threats cannot access the internal network.
Fortunately for John, it is a policy that primarily servers outside the perimeter must be patched with the latest security updates. This is where the evil hackers are known to attack. Unfortunately for the company, John is already inside the intranet and the database server has unpatched known vulnerabilities. This means that it is easy for John to exploit the vulnerability and gain control of the database server. John then uses the Metasploit framework to execute code on the host within the intranet. Metasploit is a powerful tool that makes hacking almost as easy as it is portrayed in movies.
John knows his next steps. He scans the intranet for valuable data. There is nothing to stop him now. He uses the ransomware he wrote himself to blackmail his new "client"... He hopes that the bug is fixed that prevented the customer from decrypting the data after payment last time. He almost feels a bit sorry about that.
Once inside the network, traffic is often generally trusted. Activity is mainly monitored on an ingress basis so that suspicious behavior on edge nodes will alert the administrators. This gives intruders the option of lateral movement inside the network. Gaining access to other services and manipulating, accessing or extracting data are examples of actions undertaken once the network security has been breached.
As initially mentioned, securing and defining the perimeter faces a growing number of challenges due to the evolution of network architectures. The adoption of cloud services and remote work gives less control over the network to the administrators and leaves more surface area for threats. The network perimeter is often maintained by third parties in scenarios that integrate cloud services into customer premises. Additionally, remote work will expose user devices to unknown threats that can make a penetration of the perimeter more likely.
Understanding zero-trust architecture
Zero-trust architecture eliminates the conventional security perimeter. It takes into account that there is no 100% secure system. The goal is to build a network that has resilience if any of the hosts is compromised. It does so by requiring all network entities to possess short-living verifiable identities. The system design seeks to minimize the potential scope of compromise on any host. To grasp how this ambitious goal can be achieved, a thorough understanding of the three core pillars in the architecture is essential.
The logical consequences are expressed in the main ideas of the model: “never trust, always verify” and "principle of least privileges". Below we define zero-trust architecture in three pillars that aim to minimize actions of penetrators and limit the spread of a security breach.
John feels the itch in his fingers again. The last time had worked well. He had been lucky that the two servers had old vulnerabilities, configuration errors, and access to the intranet with no additional security boundaries. But it was no luck. This is something you are likely to find often in productive systems.
The admins just couldn't keep up with the changes in business demands, no matter how good they were. And there was no well-defined strategy other than "It's your job to keep it safe."
John selects a new target. The attack surface seems promising after a preliminary enumeration. The services used by the target are spread across cloud providers. There is no firewall that must be evaded. He quickly discovers a small service where URL parameters are not properly validated, allowing for simple command injection leading to code execution.
The goal is clear: locate and tamper with valuable data storage again. The plan is straightforward: use one of the known services as a tunnel to gain access to the storage.
The system uses TLS to encrypt communication between all services. This is not unusual. However, John realizes that mTLS, which provides client identity validation, is enabled on non-public APIs. This is a major setback because he can only communicate with previously defined endpoints from the compromised service.
Pillar I: Assume breach
‘Assume breach’ is the assumption that a security breach is inevitable or has already occurred. With this assumption in mind, the effect or blast radius of a breach is minimized. This is achieved through measures in software development and identity management. The following two pillars leverage this assumption and build a more secure network in an organization.
Pillar II: Never trust, always verify
Implicit trust is omnipresent in a perimeter-based network. Zero-trust removes the differentiation between traffic coming from outside and inside the network so that no request is implicitly trusted. To carry out an action inside the network, a verification of the actor is always necessary. In practice this is often realized by mTLS connections between all services combined with a white list approach for allowed communication paths.
Users can be identified by an Identity and Access Management (IAM) system. Machine identities must be managed separately. Machine-to-machine identity verification, through white lists, is vital in a zero-trust system. This effectively limits lateral movement through well-defined paths and allows identifying suspicious behavior through a Security Incident & Event Management (SIEM) system.
Pillar III: Least privileges
Every actor should be given the least privileges needed to fulfill a given task. This further limits the blast radius of a security incident. By assigning the minimal set of rights to a user the possibility for harmful actions is significantly limited for penetrators.
Aside from using this minimal set of rights, the validity of those rights should be as short-lived as possible. Through a refresh of the verification the rights remain intact. If an actor fails to refresh the verification, the right will automatically be detracted. The concept of an active revocation is not needed if the validity of an identity is short enough.
John cannot use the next low-hanging fruit to make his way further into the network, as he did last time. He can only move along defined paths. He takes a closer look at the service he compromised and finds that he can capture access tokens for machine-to-machine communication there. Jackpot, he'll try to use the tokens to move on.
What he doesn't know is that the token can't be used everywhere and even if it could, it is only valid for a few minutes.
Trying to get further, John uses the token at a service that it was not issued for. The fact that a valid token is used in the wrong place, triggers a SIEM event and alerts an admin who quickly finds the breach and detracts its trust from the system. Due to the short validity of the tokens, the compromised host is isolated from the rest of the system within a few minutes. John realizes he is at a dead end. His compromised host loses connectivity to other services due to the missing trust. His attack ends here.
Zero-trust architecture in software engineering
It is possible to include the principles of zero-trust architecture at various levels of software engineering.
- Create machine identities by using certificates, for example through SPIFFE / SPIRE.
- Send events to SIEM in case of abnormal behavior from identities
- Integrate user identification, authentication, and authorization with a multi-factor approach can pave the way for increased security and leveraging principles of zero-trust. Limit rights to minimal required sets and only allow sessions to last as long as technically required. Refresh sessions when necessary.
- Use mTLS to enforce bilateral verification and whitelisting of identities in all communication between microservices, APIs, and other software components.
Zero-trust is established as the answer to the challenge of implementing network security in cloud environments. Governments and corporations have set the goal to establish zero trust as a standard for themselves.
Establishing zero-trust is not trivial. The whole architecture of the network has to be built upon the aforementioned pillars. The identity management of machines and humans in particular can be a challenge.
Your job at codecentric?
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.