Why good metrics values do not equal good quality

3.10.2011 | 7 minutes reading time

Quite regularly, codecentric’s experts perform reviews and quality evaluations of software products. For example, clients may want to get an independent assessment of a program they had a contractor develop. In other cases, they request an assessment of software developed in-house to get an understanding of their its current level of quality.

There often is an implicit assumption that by just using automatic analysis tools you can get a reliable impression of the quality and maintainability, saving the cost and effort for a manual review. Using a simplified example we are going to explain, why this is a fallacy and why an automatically derived set of metrics cannot be a viable replacement for the manual process.

Metrics and Tools

In fact, at the beginning of most analyses there is a step of collecting some base metrics automatically, to get a first superficial impression of the software under inspection. Usually at this early stage one uses simple counts – e. g. to get an idea of the product’s size (number of packages, classes, methods, lines of code) – as well as common quality metrics, for example the cyclomatic complexity.

These values can be quickly calculated using several free or commercial tools and are based on the source code and compiled Java classes.

Once these metrics have been measured, they can be compared to well-known references, e. g. those of Carnegie Mellon University for cyclomatic complexity.

Cyclomatic Complexity

The purpose of this metric s to get an assessment of the complexity – and therefore indirectly the maintainability – of a piece of software.

The aforementioned reference values from Carnegie Mellon define four rough ranges for cyclomatic complexity values:

methods between 1 and 10 are considered simple and easy to understand and test
values between 10 and 20 indicate more complex code, which may still be comprehensible; however testing becomes more difficult due to the greater number of possible branches the code can take
values of 20 and above are typical of code with a very large number of potential execution paths and can only be fully grasped and tested with great difficulty and effort
methods going even higher, e. g. >50, are certainly unmaintainable

Often complexity increases gradually with the life-time of a code base as new features are added and existing code is modified. Over time new code is introduced into the system, but the individual “small” changes regularly do not convey the impression of being complex enough to warrant refactoring the affected sections of the code.

In effect the risk of introducing new bugs increases proportionally with the code’s complexity as undesirable side-effects cannot be foreseen. Theoretically this could be alleviated with a sufficient level of test coverage, but unfortunately coming up with useful test code also becomes more difficult and time-consuming for complex code. This regularly leads to test coverage becoming worse, making future changes even more error prone. This is a vicious circle that is hard to break out from.

All this leads to a simple and unsurprising conclusion: Lower complexity eases maintenance, writing meaningful tests and consequently reduces the chances of introducing new bugs. It can therefore be used as an indicator for good quality.

Let’s assume the following result of a complexity analysis of a code base with 10.000 methods:

96% – 9600 methods: CC < 17 : acceptable
3% – 300 methods: 17 < CC < 20 : borderline
1% – 100 methods: 20 <= CC : too high

Does this mean that complexity is not a critical issue in this code base?

The answer has to be: No.

The statement of “only” 1% of all methods being reported as too complex does not carry much meaning in and of itself. There is no way to tell if those 100 methods contain central and mission critical business logic and are disproportionately important for the overall application’s quality.

However, the complexity metric alone does not say anything about the possibly great test coverage of this critical portion of code. Thorough testing could have been deliberately introduced to verify the correctness and guard and against regressions despite high complexity values. But we can get more information on that topic with more tools…

Test Coverage

Several tools are available to determine test coverage, a few popular ones being Clover, Cobertura or Emma. They monitor the execution of unit tests and report on which parts of the code under test are exercised. This allows a reasonable evaluation of which percentage of a software product is covered by automated tests.

While it is difficult to proclaim a generally valid minimum degree of test coverage, because it partly depends on the application at hand – e. g. completely covering trivial bean setters and getters is not usually very useful – values of 80% or above are advised to be sufficiently confident that refactorings and modifications will not break existing functionality.

Assuming an average test coverage of 85% – esp. including the 100 complex (and allegedly important) methods mentioned above – would that not imply a reasonably good code quality, because the source code is covered by tests for the most part?

Again, the answer must be: No.

Even high levels of test coverage only prove that the execution paths that are exercised by the tests are run at least once and with a particular set of test data. Even though the coverage tools do record the number of times each branch gets executed, for it to be “covered” just requires a single execution.

Moreover, 85% of coverage leave 15% uncovered – there is no immediate indication of which parts comprise that 15%. Not seldom this is code for error conditions or exception handling, which can have especially nasty consequences when there are bugs lurking around here.

and so on…

Everything that has been said so far can be applied to virtually all calculated metrics: Every automated analysis process can at most produce hints as to which parts of the code should be targeted for a manual review. They provide starting points and allow a directed approach of large projects, but just looked at in isolation is never sufficient and can even be misleading.

In a recent case, good or sometimes even very good results of the initial automated metrics analysis runs, including – among others – cyclomatic complexity and Robert C. Martin’s metrics about levels of coupling and abstraction, conveyed a rather positive first impression of the subject project.

Even further diagnostics using static analysis tools like Checkstyle , FindBugs or Sonar did not report unusually high numbers of problems, relative to the overall size of the software product, and those issues that were reported would mostly have been rather easy to fix.

But despite the seemingly uncritical results of all tool runs, at the end of the review process we had found a number of severe problems in the code base that clearly prohibited the customer from going live with the new product. Some of – but not limited to – these problems were fundamental issues with concurrency, useless caches, severe flaws in error- and exception handling and obvious performance problems (unnecessary, but frequent calls to remote services in tight loops) etc.

Bottom Line

Judging the quality of a software product – and consequently the risk when using it in production – by tool-based measurements and metrics alone can easily lead to false conclusions.

Too many factors that influence the actual quality of a solution cannot reliably, if at all, be evaluated automatically. Despite lots of great and proven tools being readily available and even free to use, their results still require careful evaluation – they must be seen as the indicators that they are, not comprehensive and final statements about quality. They can only lead to the way and hint at where it might be sensible to focus a manual review.

In the case mentioned above, using the software in production would have had far-reaching and potentially critical consequences, because data could have been corrupted silently or the system might have crashed completely.

Though manual reviews and checks cannot guarantee error-free software, even in the IT business experience and intuition – luckily – still cannot be replaced with tools.

Was this post helpful?

Blog author

Daniel Schneller

Do you still have questions? Just send me a message.

Nested Fixture Pattern for JUnit

JUnit's @Nested classes are usually presented as a way to group related tests. But combined with @RegisterExtension and ExtensionContext.Store, they become something more powerful: a declarative scenario tree where each level adds a scope in which fixtures...

Testing
Java
Software development

9.3.2026 | 11 minutes reading time

Rüdiger zu Dohna

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 minutes reading time

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 minutes reading time

Roger Butenuth

Nils Winking

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 minutes reading time

Danny Keller

How to validate your Spring Boot implementation when choosing an API first...

When choosing to follow the API First approach, ensuring that the actual implementation follows the defined specification can present a significant challenge. Achieving alignment between the specification and implementation is crucial, as it greatly...

Spring
API
Java
Validation

7.6.2024 | 6 minutes reading time

Hendrik Kamp

Count your queries! Repository integration tests with Hibernate Statistics

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked....

Java
Testing
Spring
Database

7.8.2023 | 6 minutes reading time

Kevin Peters

Compile once, run anywhere with WebAssembly and WASI

WebAssembly was initially created to bring languages other than JavaScript to the browser. Its design goals include portability, safety and performance. WASI (WebAssembly System Interface) lifts those capabilities to the world outside the browser. This...

Go
Java

3.2.2023 | 10 minutes reading time

Julian Arz

Microstream – the end of O/R mappers?

Searching for alternatives to O/R mappers and persistence frameworks for NoSQL databases, I came across Microstream and was interested pretty quickly. On the one hand because Microstream is being developed in my home region Oberpfalz, but mainly because...

Java
Database
Software architecture

29.9.2022 | 14 minutes reading time

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Heroku is cancelling their free plan! What about all my open-source projects? Luckily fly.io comes to the rescue! Here are the missing docs on how to run Spring Boot on fly.io.Why I love(d) HerokuHeroku was my go-to PaaS for open-source projects for ...

CI/CD
Java
Cloud
DevOps
Spring

18.9.2022 | 17 minutes reading time

Planning Poker: Tools for online estimation sessions

Many agile teams are using Planning Poker or Sprint Poker to estimate the size of their product backlog items. Shifting to remote or hybrid work, your team might look for a solution to hold virtual Planning Poker sessions. Luckily there are a lot of ...

Product management
Project management
Agile
Remote Work
Agile methods

23.6.2022 | 9 minutes reading time

Keycloak.X, but secure – without vulnerable libraries

TLDR: How to reduce the known CVEs (common vulnerabilities and exposures) to zero by creating your own Keycloak distribution* .IntroductionKeycloak (see website) will become easier and more robust by switching to Quarkus, at least that’s the promise...

Java
IT-Security
Keycloak

9.5.2022 | 11 minutes reading time

Jira templates for user stories, tasks and bugs

A recurring task in product management is writing user stories. In agile product development, a user story describes requirements for a product that are formulated from the viewpoint of a user. Therefore they become a key tool to work with requirements...

Project management
Agile
Atlassian
Product management
Agile methods

12.1.2022 | 4 minutes reading time

Migrating a Spring Boot application to Java 17 – the hard way: Day 2

Welcome back to my article on migrating a Spring Boot application to Java 17 – the hard way.On day 1 we:tried using Java 17 with our Spring Boot 2.3.3.RELEASE, didn’t workupgraded Lombok and MapStructcouldn’t upgrade ASM, since Spring repackages ASMupgraded...

Java
Spring

22.12.2021 | 18 minutes reading time

Migrating a Spring Boot application to Java 17 – the hard way

Java 17 has recently been released, and I’m excited for the many improvements and new features. Instead of starting from a new or recent project (where’s the excitement in that?), we’re going to update an existing Spring Boot application until we can...

Java
Spring

14.12.2021 | 11 minutes reading time

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 minutes reading time

Hendrik Schawe

Getting efficient with code and IDEs

Have you ever wondered why people use their Integrated Development Environment (IDE) in other ways than you do? Noticed people being dramatically slower or faster than you while programming? Remember that person using their mouse for every little action...

Software development
Java

6.10.2021 | 11 minutes reading time

Axon Framework 102: Dealing with personal data

Welcome to Axon Framework 102, where we will be deep diving into many interesting challenges you will encounter when working with Axon Framework. We will be diving into asynchronous projections and letting the front-end know new data. We will take a ...

DDD
Framework
Pattern
Event Sourcing
Compliance
Java
Kotlin

10.5.2021 | 5 minutes reading time

Crowded backlog? A product is more than the sum of its features

We often find businesses in a stage of growth where they are experiencing problems caused by an increasing number of customer requests and requirements. They missed the moment when their success created the need for a different approach to their requirements...

Product management
Agile
Coaching
Agile methods

28.3.2021 | 5 minutes reading time

Anja Frank

Agile Toolbox: 10-minute story time

Backlog refinement meetings can become unrewarding and tedious really fast if you have to work through 20 stories in two hours. Wouldn’t it be nice if there was a format where a team could use its full energy while at the same time upping their flexibility...

Agile transformation
Process management
Product management
Project management
Agile
Coaching
Agile methods
Software architecture

23.3.2021 | 7 minutes reading time

Measuring collaboration tool success – Still a fool with a tool? – Part...

RecapIn the first part of this blog post , we have integrated motivations and best practices for using collaboration tools into a measurement framework following a three-step process. The fourth and final step is probably the trickiest one: Finding ...

Atlassian
IT-Governance
Agile methods
Project management
Remote Work

7.1.2021 | 5 minutes reading time