Speed up your CI/CD jobs in Kubernetes

2.9.2021 | 7 minutes reading time

A performant and well integrated CI/CD environment is one of the key factors for fast and agile software development. To achieve short feedback cycles and increase development speed, jobs need to be as fast as possible and – ideally – should start instantly to keep the runtime of your pipeline as low as possible.
This blog post will explain how to speed up your Kubernetes-based CI/CD infrastructure.

CI/CD with GitLab and Kubernetes

We use GitLab as our code-management tool. GitLab ships with a fully integrated CI/CD solution that supports executing your jobs on a Kubernetes cluster with the Kubernetes executor. Using this executor on an auto-scaling Kubernetes cluster can be a great way to have a dynamic CI/CD environment. This setup is capable of automatically providing to your users what they need in terms of resources. At the same time, costs are only caused when resources are used since auto-scaling is enabled.
For each CI job that’s triggered via a GitLab pipeline, the runner creates a new pod in the cluster. Therefore, there is usually a highly varying workload on the cluster with peak times and low times, often depending on the time of day.
Adding auto-scaling to such a cluster setup can be achieved with the Cluster-Autoscaler . This tool scales your cluster to an absolute minimum in times with only a few or no build jobs at all and scales out to a bunch of nodes, if a lot of jobs need to be processed.

How does the Cluster-Autoscaler work?
The Cluster-Autoscaler adds new nodes to the cluster if there are pods in the “unschedulable” state. With the default scan-interval, a scale-up is triggered up to 10 seconds after a pod was marked unschedulable. unschedulable A pod is considered unschedulable if there is no node suitable to host the workload. This might be the case if, for example, all resources are exhausted. It shuts down nodes if they are unneeded for at least 10 minutes. Nodes are unneeded if they are empty or the workload can be shifted to the remaining nodes. Please refer to the documentation to obtain more information on when pods are considered shiftable.

How does the Cluster-Autoscaler work?

The Cluster-Autoscaler adds new nodes to the cluster if there are pods in the “unschedulable” state. With the default scan-interval, a scale-up is triggered up to 10 seconds after a pod was marked unschedulable.

unschedulable: A pod is considered unschedulable if there is no node suitable to host the workload. This might be the case if, for example, all resources are exhausted.

It shuts down nodes if they are unneeded for at least 10 minutes. Nodes are unneeded if they are empty or the workload can be shifted to the remaining nodes. Please refer to the documentation to obtain more information on when pods are considered shiftable.

The problem with autoscaling in CI/CD environments

The runner will schedule a pod for every CI job from a GitLab Pipeline. If there are free capacities in the cluster, this pod will almost immediately start and run your code. But what if the cluster’s resources are already fully allocated?
Depending on your setup and the chosen cloud provider, a scale up may need some time – up to 5 minutes (k8s-related initialization included), even on famous cloud providers like AWS, GCP or Azure. Assuming the worst-case scenario – adding 5 minutes to nearly every job – no matter if the job needs 5s or 20m? That may lead to very unhappy users and a lot of inefficiency.

Use overprovisioning to reduce startup overhead

One way to solve the previously described problem is overprovisioning.
Overprovisioning means that the cluster always provides some more resources than actually needed. With overprovisioning in place, we could make sure that there are always some resources available, so that your CI jobs won’t have to wait for new capacities to become available.

Unfortunately, this is not a built-in feature of the cluster-autoscaler. To achieve cluster-size dependent overprovisioning, the team providing the cluster-autoscaler proposes a solution in their FAQs , using the Cluster-Proportional Autoscaler (short: CPA) and a placeholder deployment.

How does the proposed solution work?

To achieve overprovisioning, you’d only need the placeholder deployment. Proposed is a deployment based on the pause-image. The only purpose of these pods is to allocate the configured amount of resources.
To benefit from the additional, allocated resources, the pause pods need to be evicted immediately if a build job is scheduled. This can be achieved using the PriorityClass-resource in Kubernetes. By assigning a PriorityClass with a low priority to the placeholders and a PriorityClass with a higher priority to the rest, Kubernetes will remove the pause pods in favour of the CI job.
Because the placeholder is controlled by a deployment controller, the stopped pods will be rescheduled. If there are no resources left to allocate in the cluster, the pod becomes unschedulable and the cluster-autoscaler triggers a scale-up of the nodes.

To improve the very static approach above, the FAQs suggest to use the Cluster-Proportional Autoscaler. The CPA is a tool which is capable of scaling a target resource based on the actual cluster size. It constantly checks how many nodes are part of the cluster (alternatively checks for sum of CPU cores) and adapts the number of replicas for the target resource as configured. With this component in place, you can control the amount of placeholder pods based on cluster-size.

Examples
For example, you can configure the CPA in a way that it always scales the target to half as many replicas as there are CPU cores. Alternatively, you can define a ladder function, like: scale to 2 replicas if the cluster-size is up to 5 nodes, and to 7 replicas if the cluster size is more than 5 nodes.

A Helm chart to rule them all

At the time of writing this blog post, there was no Helm chart that installs all necessary components in your cluster. There is a fairly new Helm chart for the CPA, which can be found here . To deploy the placeholder deployment and the Priority-Class setup, one could use this helm-chart by Delivery Hero.

But, to make the installation as smooth and integrated as possible, we decided to create yet another Helm chart, combining both of the components and adding the possibility to use different overprovisioning configurations using schedules.
You can find the new Helm chart called cluster-overprovisioner on Github.

Without much configuration, this Helm chart deploys the CPA and a placeholder, called overprovisioning (OP), deployment as the target including the PriorityClass setup for evicting the pause pods. The only thing that should be adapted to your needs is the defaultConfig and the op.resources block. Examples and explanations for the former can be found in the Readme or the examples folder in the repo.
The latter one needs to be adapted to your use-case. In our case, we decided that each pause pod should reserve capacity for an average CI job.

Example configuration with descending replicas
Currently, we use the following configuration: `1ladder: 2 { 3 "nodesToReplicas": 4 [ 5 [ 0,7 ], 6 [ 8,4 ], 7 [ 12,0 ] 8 ] 9 }` We have more overprovisioning for smaller cluster sizes and disable it completely if the cluster grows bigger than 12 nodes. We assumed, based on the default runtime of our CI jobs, that the bigger the cluster is, the more likely it becomes for some of the pods to be about to be terminated and space to be freed up for new build jobs. Therefore, we use the ladder mode with descending replicas the bigger the cluster becomes.

Example configuration with descending replicas

Currently, we use the following configuration:

1ladder:
2  {
3    "nodesToReplicas":
4    [
5      [ 0,7 ],
6      [ 8,4 ],
7      [ 12,0 ]
8    ]
9  }

We have more overprovisioning for smaller cluster sizes and disable it completely if the cluster grows bigger than 12 nodes.
We assumed, based on the default runtime of our CI jobs, that the bigger the cluster is, the more likely it becomes for some of the pods to be about to be terminated and space to be freed up for new build jobs. Therefore, we use the ladder mode with descending replicas the bigger the cluster becomes.

Use schedules to keep your bill under control

Assuming most of your devs are working within the same or similar time zone, you most definitely can define time frames, in which you can waive the start-up boost given by overprovisioning in favour of reducing your compute cost. Therefore we introduced a scheduling feature into our chart. This feature is based on CronJobs. It enables you to provide different configurations for the CPA using cron expressions.

1schedules:
2  - name: night
3    # disable overprovisioning Monday - Friday from 6pm
4    cronTimeExpression: "0 18 * * 1-5"
5    config:
6      ladder:
7        {
8          "nodesToReplicas":
9          [
10            [ 0,0 ]
11          ]
12        }
13  - name: day
14    # enable overprovisioning Monday - Friday from 7am
15    cronTimeExpression: "0 7 * * 1-5"
16    config:
17      ladder:
18        {
19          "nodesToReplicas":
20          [
21            [ 0,7 ],
22            [ 8,4 ],
23            [ 12,0 ]
24          ]
25        }

For example, we have these schedules installed in our CI cluster. The night schedule completely disables overprovisioning after 6pm and on weekends. We do have scheduled jobs that run at night or even on weekends. For these the longer startup-time does not matter, as no one is waiting for the jobs to complete.
Another schedule, called day, increases the amount of overprovisioning from 7am on Monday to Friday to the desired amount.

As you can imagine, adding overprovisioning to your cluster increases your total costs. Instead of providing the same amount of overprovisioning 24/7, we strongly recommend making use of the schedules. This way, you achieve the best balance between low startup times and additional costs.

Conclusion

CI/CD infrastructures on Kubernetes benefit very much from adding autoscaling to the cluster. It reduces compute costs to an absolute minimum in times without build jobs and is capable of handling the busiest times. Implementing overprovisioning in this setup reduces the startup-times of your jobs. To minimize the additional costs added by the overprovisioning, we introduced a scheduling feature, with which you can enable overprovisioning only in times when it’s needed and achieve a good balance between performance and costs.

Was this post helpful?

Blog authors

Frederik Grieshaber

Do you still have questions? Just send me a message.

Thilo Wobker

Do you still have questions? Just send me a message.

fromFrederik Grieshaber & Thilo Wobker

Time to Renovate

How to keep your IT infrastructure up to date and reduce manual effort to a minimum by using Kubernetes, Helm, GitOps (FluxCD), Continuous Integration (GitLab-CI) and Renovate. When we moved into our house, everything was new and shiny. Well – it was...

DevOps
Infrastructure as Code

19.12.2022 | 8 minutes reading time

Daniel Marks

Frederik Grieshaber

Secure your Kubernetes workloads with OPA Gatekeeper

Last month, Kubernetes 1.25 was released. And with that, the long-announced removal of PodSecurityPolicies (short: PSPs) finally becomes reality. Finally? Yes – as Tabitha Sable from the Kubernetes SIG Security Team said herself in the linked blog post...

IT-Security
Kubernetes
Infrastructure

15.12.2022 | 8 minutes reading time

Frederik Grieshaber

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Pull off Architecture Reviews at Light-Speed with LASR!

Foreword: This blog is loosely based on a recent project experience. All persons, companies and names are fictitious, as to make them NDA compliant. Any resemblance to a person, existing company or brand is purely coincidental and unintentional.For most...

Software architecture

4.4.2025 | 13 [Missing String "readingTime"]

Feature-Sliced Design and what we need for good frontend architecture

Feature-Sliced Design and what we need for good frontend architecture While a lot has been published on the topic of software architecture in the backend, and there are well-established best practices, this topic is less prominent for frontend applications...

Software architecture
Frontend

23.1.2025 | 10 [Missing String "readingTime"]

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 [Missing String "readingTime"]

Danny Steinbrecher

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal...

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal Architecture Modularization is a key concept in modern software development to make applications maintainable, testable and flexible. In this article we will see how Spring Modulith...

Software architecture
Kotlin
Spring

14.1.2025 | 9 [Missing String "readingTime"]

Danny Steinbrecher

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 [Missing String "readingTime"]

Daniel Kocot

The Ultimate Tool for Engineers and Developers: Compass Premium

It’s not an every day activity that a tool comes and redefines how engineering and development teams operate, but Compass is the tool with a game-changing solution. As Atlassian's out-of-the-box internal developer platform, Compass helps teams to stay...

Atlassian
Cloud

3.12.2024 | 4 [Missing String "readingTime"]

Özge Kavas

Living on the edge: building serverless applications with Cloudflare Workers

Cloudflare is best known for its CDN, DNS server (1.1.1.1) or WAF/DDos mitigation services. These services are highly predicated on “Edge Computing”, bringing data closer to the user interested in those services – a user in Australia will be happier ...

Cloud native
Cloud
Serverless

28.11.2024 | 14 [Missing String "readingTime"]

We deployed our SaaS Application on fly.io (and it was great).

How we deployed our application in a fraction of the time while saving 100% of the cost. Our team, a bunch of experienced software engineers without prior contact to cloud deployments, wanted to deploy our OCPP-compliant EV Charging Station Simulator...

AWS
Cloud

23.10.2024 | 4 [Missing String "readingTime"]

Jannis Mainczyk

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 [Missing String "readingTime"]

Danny Steinbrecher

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 [Missing String "readingTime"]

Markus Höfer

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

API gateways are essential for managing and securing data flow between services. As software architectures evolve, different types of API gateways have emerged to address specific challenges: Legacy, Agnostic, and Kubernetes-native. Drawing on insights...

API
Software architecture
Infrastructure
Integration

21.8.2024 | 12 [Missing String "readingTime"]

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 [Missing String "readingTime"]

Dr. Florian Rademacher

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

A few weeks ago, when we introduced Dapr, we also discussed its overlapping capabilities with a service mesh, although Dapr itself is not a service mesh. As already mentioned in a previous blogpost, in recent years service meshes have become a pivotal...

Networking
Microservices
Kubernetes
Cloud native

1.8.2024 | 16 [Missing String "readingTime"]

Manuel Zapf

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 [Missing String "readingTime"]

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

In a recent blog post, we explored how Dapr works and how to test it on a simple local Kubernetes cluster. One of Dapr's key advantages is its component system, which enhances portability. In this post, we'll take our previously daperized demo app and...

Software development
Cloud
Azure
Cloud native

22.7.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Spring Boot and HTMX: The boring app

Motivation Most apps I touched in the wild follow the same two tiered approach. A backend delivering JSON (some may call this REST) and a frontend framework, consuming JSON from the backend converting it to the HTML displayed to the user. Worst case,...

Software architecture
Software development
Spring
Kotlin

28.6.2024 | 16 [Missing String "readingTime"]

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

I recall the days when writing a web application in C# with .NET meant deploying it on an IIS web server for accessibility. Today, this approach seems outdated, especially with the shift towards microservice-based architectures. Fortunately, Microsoft...

Software architecture
Open Source
Cloud
Microservices
Infrastructure as Code
.NET
Cloud native

27.6.2024 | 8 [Missing String "readingTime"]

Manuel Zapf

Zero Trust Azure Identity & Access Architecture

Falko Lehmann and Hendrik Kamp have already explained in their blog post on Zero-trust Architecture why zero-trust security models are preferable to traditional perimeter security models in order to minimize damage from cyber attacks. Falko and Hendrik...

IT-Security
IAM
Azure
Software architecture

4.6.2024 | 14 [Missing String "readingTime"]

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...

Ever wondered how the technology that seamlessly manages microservices traffic evolved from early implementations to lean, kernel-level solutions? Let's dive into the fascinating journey of service meshes, from Linkerd 1.x to the cutting-edge technologies...

Cloud
Networking
Infrastructure
Kubernetes
Linux

22.5.2024 | 10 [Missing String "readingTime"]

Manuel Zapf

Speed up your CI/CD jobs in Kubernetes

CI/CD with GitLab and Kubernetes

The problem with autoscaling in CI/CD environments

Use overprovisioning to reduce startup overhead

How does the proposed solution work?

A Helm chart to rule them all

Use schedules to keep your bill under control

Conclusion

Was this post helpful?

Blog authors

More articles

Time to Renovate

Secure your Kubernetes workloads with OPA Gatekeeper

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Pull off Architecture Reviews at Light-Speed with LASR!

Feature-Sliced Design and what we need for good frontend architecture

Hexagonal Architecture is just an island

Modularization the easy way: Spring Modulith with Kotlin and Hexagonal...

Charge your APIs Volume 36 - Trends for 2025

The Ultimate Tool for Engineers and Developers: Compass Premium

Living on the edge: building serverless applications with Cloudflare Workers

We deployed our SaaS Application on fly.io (and it was great).

ArchUnit in practice: Keep your Architecture Clean

Dangling DNS in cloud infrastructures

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

When Business Meets Technology: From Data Product to Data Architecture...

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

Spring Boot and HTMX: Deployment to AWS Lambda

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

Exploring Dapr: A Deep Dive into Distributed Application Runtime

Spring Boot and HTMX: The boring app

Modern Microservices: Unleashing the Power of .NET Core, Aspire, and Dapr

Zero Trust Azure Identity & Access Architecture

From sidecars to sidecarless: Tracing the evolution of service mesh technologies...