Popular searches
//

AI Code Tsunami Hits the QA Dam: The End of Balanced Velocity

30.3.2026 | 8 minutes reading time

Note upfront: This article is specifically aimed at teams working on the modernization and further development of existing systems, not at greenfield projects where completely different rules apply.

Everyone is talking about the massive productivity boost from Artificial Intelligence in software development. Tools like GitHub Copilot and other AI assistants promise that we can implement features in a fraction of the time it used to take. And indeed, code generation has become incredibly fast.

But if we look at the reality of existing software projects and established teams, a different picture often emerges. Developers are faster, but the business value doesn't reach the end customer any quicker. Why is that? We are falling prey to an "AI Speed Illusion". We have forgotten that writing code is only one part of the equation.

Of course, software development has countless variables. When looking at delivery speed, however, four sectors are essential: Requirements, Development, Quality Assurance (QA), and Delivery. An isolated speed boost in just one of these areas does not make the overall system faster.

Often, the bottleneck is right at the beginning, when clarified requirements merely trickle into development like a small stream. But even if the requirements flow and the AI turns this stream into a code tsunami in record time, the next bottleneck awaits right before delivery. In this article, we want to focus on exactly this part of the equation: QA.

The Decade of Equilibrium

To understand the problem, we need to look at how existing teams work away from the greenfield. Many of these projects have been running smoothly for years or even decades. Over time, a natural equilibrium has established itself within these teams: a balanced velocity across all four sectors – from requirements to delivery.

Developers wrote code at a certain pace. The test infrastructure and the often manual QA instances were perfectly aligned to validate this output in a timely manner. It was a rhythm that worked. Dev speed and QA approval were in a perfect balance.

The Tsunami and the Dam

And then came AI.

Within a very short time, the speed at which code is produced has multiplied. (Just how extreme this speed can be in a greenfield scenario is shown in our experiment: A complete PoC in 5 minutes with AI-Assisted Coding). The historical equilibrium was literally washed away. Development is now generating a veritable code tsunami.

The problem for existing teams is that the processes, the test infrastructure, and the people in QA have remained exactly the same. The tsunami crashes unhindered against the existing QA dam. If test runs for a feature take hours or if manual testing instances by QA testers still exist, the code piles up. The features might be "Code Complete", but they are not in production. The actual added value for the business does not increase. It remains stuck in the QA bottleneck.

Important to understand: This is not a failure of QA! It is simply a system that was optimized for a completely different speed and for project situations of the past.

The Boomerang Effect: Why AI Code Without Fast Feedback Becomes Instant Legacy Code

This traffic jam in front of QA doesn't just impact time-to-market; it inevitably hits back at the development team. Because eventually, the code gets tested. And when bugs are found, the ticket wanders back to the developer. The problem here is that, due to the bottleneck, this feedback often arrives hours or days later.

Now, one might say: "Context switching and waiting for QA have always been exhausting and expensive." That's true. But when we have to fix a bug in AI-generated code, a new, dangerous dimension is added: the lack of mental anchoring.

When a developer types out a complex feature entirely on their own, they build a deep mental model of the logic, line by line. If this code fails in manual QA three days later, they remember their train of thought. However, when we build code with the help of AI, we often switch to the role of a reviewer. We accept blocks of code. The mental model we build in the process is significantly shallower.

If a test run now takes hours or QA feedback takes days, and we have to fix a subtle bug, a tedious reverse-engineering process begins for code that, strictly speaking, we never really wrote ourselves. The cognitive load is enormous. Without immediate feedback, we are essentially creating Instant Legacy Code – code that has just been generated, but which no one on the team understands in detail anymore. We are experiencing software entropy on steroids: the creeping decay of the codebase and the build-up of complexity, which used to take months, suddenly happens in a few weeks with AI-generated code. Fast and automated feedback in minutes is therefore the only way to catch bugs while this shallow AI context is still fresh in the developer's mind.

The Test Pyramid is More Important Than Ever: When is a Test Suite AI-Ready?

To truly practice AI-Powered Development in legacy projects, there is only one way out. We don't have to reinvent the test pyramid, but we have to take it more seriously than ever before. The classic test pyramid never went away, but in the age of AI, it is elevated from an important architectural concept to an absolute survival strategy.

Only those who can work without bottlenecks will be able to translate the AI's horsepower to the road. But what does that mean in concrete terms? A test suite is only truly AI-ready if it meets the following criteria:

  • Blazing Fast Base: Unit and component tests finish in under 5 minutes. They are the first and most important filter for AI-generated code.
  • Fast Feature Cycle: Feedback at the feature level arrives within a developer's time slot (under 30 minutes) at the latest.
  • Trustful Test Suite: It's not enough that "the code compiles". Developers must be able to trust it blindly. A green test run means the entire feature and application work flawlessly. Without this trust, developers will request manual re-tests at every step as a safety net.
  • Readable Error Messages: Error messages and log messages must be as clear as possible and provide concrete hints about the error. A generic "End-to-End test red, good luck" helps neither the developer nor the AI. The AI can only provide real value in automated debugging if the logs are precise.

AI Writing Our Tests? A Reality Check and the TDD Comeback

Now one might argue: "Then let's just have the AI write the tests too!"

Test generation has undoubtedly improved a lot. However, practice in grown systems shows that the more integrative and complex the dependencies become, the more the AI reaches its limits. (If you want to know exactly where the current limits of AI assistants lie in everyday development, I recommend our article Bugs, Refactoring, Tests: Where Chatbots Shine in Coding and Where They Fail).

Here, a surprising phenomenon reveals itself in practice: The return to Test-Driven Development (TDD), but in the new guise of AI. It is much more effective to define well-thought-out, sharp tests upfront. These tests provide the AI with the exact guardrails, business context, and edge cases it needs. The AI generates significantly better production code from pre-written, strong tests than when asked to invent tests retroactively for its own production code. Tests become the most powerful prompt a developer can have.

How well this principle works in practice and how to force the AI to write clean code against hard test specifications is shown in the excellent codecentric blog post No Cheating Allowed: Isolated Specification Tests with Claude Code.

Tests as Living Memory and Indispensable Safety Net

Another crucial and often overlooked point is that tests are not just pure quality assurance. They act as an external memory for context that the AI would otherwise swallow or simply not know.

This also shows why regression tests are so extremely important in the AI era. When the AI refactors existing code or integrates new features into a grown system in the future, comprehensive regression tests are the only safety net ensuring that historically, painstakingly developed functions do not break unnoticed. If agreements or complex edge cases are manifested in acceptance or unit tests, the AI will have much easier access to them in the future than to textual descriptions of edge cases gathering dust in some Confluence wiki.

A large test codebase is therefore absolutely not ballast. It is the long-term memory of the system, a living and machine-readable documentation, and a massive boost for future AI development.

Conclusion: No AI Success Without Test Excellence

"Cramming AI into old processes only creates traffic jams faster — we see this with customers every day. The real leverage doesn't lie in code generation, but in the question of whether the entire delivery pipeline can keep up. If you don't bring your test pyramid and CI/CD up to AI speed, you're merely shifting the problem from the developer to QA. Investing in test excellence is no longer optional; it is the ticket to AI-native development." – Kai Lichtenberg, Head of Business Unit at codecentric

We are in a time of upheaval. Anyone who wants to benefit from AI in software modernization must not just look at code generation.

Investing in AI development tools only truly pays off at the end of the day if parallel investments are made in the test infrastructure, CI/CD pipelines, and the execution times of test suites. If you neglect your test pyramid, you won't get faster. You merely shift the problem and produce a traffic jam in front of the dam gates. qa dam.jpeg

It has never been as important as it is today to have an excellent, fast test suite. It is the foundation upon which AI speed can happen in the first place.


Recommended reading: If AI takes over writing code and we have to focus more on architecture, tests, and quality assurance – what does that mean for our jobs? Read more in our article: Will AI Replace Software Developers?

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.