No time for monitoring?

1.9.2010 | 6 minutes reading time

Monitoring big, distributed Java landscapes helps tremendously to keep complex applications under control. But many administrators spare the effort to set up monitoring: No time. Now a timesaving solution is in sight.

„We are maxed out anyway. We need a solution that helps to make our work more effective and not something that if we are lucky saves about as much time as it needs to set it up and maintain it“.
I hear statements like those again and again from IT administrators. With the effect that APM solutions are mainly used for firefighting by experts.

So, what is needed? A solution which allows to monitor a large number of applications with a minimum of configuration effort and identifies the root cause of problems quickly.

Indeed I found and tested a tool which fulfills those requirements. AppDynamics developed a product which is convincing not least because of its ease of use. I was sceptical at first but haven’t been disappointed in a couple of evaluations. Almost as easy as an iPhone- or Android-App. Simply use it.

The 3 steps towards 24×7 monitoring

Let’s take a look at the steps needed to establish application monitoring and how the AppDynamics solution adds value and saves time.

1. What to measure? – Measuring Points

The definition of measuring points (or sensors, probes) is the first challenge. Most APM solutions for Java or .NET use BCI (bytecode instrumentation) to get performance data. The measuring points need to be determined very carefully to prevent a big influence on the results (overhead) as additional code is executed. This usually asks for the assistance of an expert, an architect or developer. For every application that needs to be monitored.

If agile development processes are used this is an exhausting process as classes can change daily or new frameworks are added. A “trial-and-error” approach in production is prohibitive due to the necessity to restart the application servers most of the times. Additionally the overhead can be increased inadvertently to an unbearable level for the users.

AppDynamics uses a patent pending technology which needs only a minimum of BCI and still is capable of delivering information on method level to identify “loitering” components. And that without any configurations effort. The architect / developer can do his day job without being bothered by the admin.

2. How to get an overview? – Visualization

Dashboards are commonly used to provide an overview over the architecture (which component talks to whom and how often?) and the business transactions (which transaction is behaving cumbersome and who is affected) for all involved applications.

Most vendors use “customizable dashboards” for visualization as a kind of panacea where every view can be adjusted for every type of user. And that is exactly what needs to be done for every detail and every application – so to say “mustomizable dashboards”. Any change in the environment or the business functionalities requires additional effort.

AppDynamics dashboards are created automatically and determine business transactions based on the “inner” values of an application (e.g. strus actions, URI patterns or HTTP parameters). If the default settings are not matching they can be changed with a few clicks and the system is ready for action.

AppDynamics - Application Overview

AppDynamics Application Flow Map

3. Red Alert! Something is going wrong. – Thresholds

What defines a problem in production? Usually something out of the norm, e.g. a user login takes 3-times the time that is normal for that time of the day or a JVM uses excessive amounts of CPU. Such abnormalities are visible with the help of predefined thresholds where a violation thereof results in an incident or alert.

Now what I see in the real world are 100 and more applications with a multitude of different business transactions which have very divers “normal” response times: Sometimes 2 seconds are very good (cost calculation for an isurance policy), sometimes 200 ms are a catastrophe (placing a bet on an online beting patform). Or worse: There are no non-functional requirements defined at all, so that the thresholds have to be set using a dice initially and later adjusted. With only 50 applications with 50 transactions each we have a stunning 2500 thresholds that need to be set and checked. On a regular basis. And we only looked at response times so far…

With AppDynamics this is not needed. A slick baselining and statistical methods like standard deviation are used to automate this work. You can adjust each value individually if needed but 95% of all thresholds are already covered with the default rules. This includes time of day and weekly differences; e.g. on monday mornings the login process takes longer because of the load and will not raise an alert though the same response time causes an incident 2 hours later or on tuesday morning as it is above the norm for that timeframe.

4. And what about root cause analysis? (Bonusstep)

Alerting in case of problems is nice and needed the admin knows that something went wrong or is about to go wrong in advance but who to notify for remedy? Triage and root cause analysis capabilities complete the monitoring. This means identifiying the responsible person to resolve the problem and additionally given them the details to return to normality quickly.

I stated before that AppDynamics instruments very little bytecode. How are the necessary details retrieved then? AppDynamics uses so called snapshots, which include a call stack with timings and details about the transcation itself. Snapshots are taken automatically of abnormal transactions (too slow, erroneous, etc.), on demand and time based (like every 10 minutes or every 100th occurence). With this technology an administrator is spared a tsunami of data but is equipped with exactly the necessary information when he / she needs it.

In the coming weeks we will publish a series of blog posts on how to diagnose different kinds of performance problems in detail.

Simple and effective

In summary: AppDynamics created an easy to use and effective solution in which I see the promises of the last seven years kept. A simple to use system which was developed specifically for the monitoring of highly distributed, business critical Java applications.

Revolutionary? No, rather evolutionary. AppDynamics paid attention to the shortcomings of existing solutions and put a lot of thought into automation. “2-3-100” is the goal. 2 administrators take 3 days to setup 100 applications for monitoring.

While the first providers of APM solutions for Java and .NET had the goal to open the blackbox and get some data at all the second generation expanded this to transactions in order to be able to x-ray modern SOA/SBA based applications. What was missing was the usability and automation. How can I effortlessly sort my data and turn it into valueable information?

Let’s take a look into the next generation of APM!

Put an agent into an application (see AppDynamics Lite Screencast by Fabian ), let it send data to the central controller and simply wait for the first results to reveal themselves.

Was this post helpful?

Blog author

Rainer Schuppe

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 [Missing String "readingTime"]

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 [Missing String "readingTime"]

Roger Butenuth

Nils Winking

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 [Missing String "readingTime"]

Danny Steinbrecher

How to validate your Spring Boot implementation when choosing an API first...

When choosing to follow the API First approach, ensuring that the actual implementation follows the defined specification can present a significant challenge. Achieving alignment between the specification and implementation is crucial, as it greatly...

Spring
API
Java
Validation

7.6.2024 | 6 [Missing String "readingTime"]

Hendrik Kamp

Count your queries! Repository integration tests with Hibernate Statistics

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked....

Java
Testing
Spring
Database

7.8.2023 | 6 [Missing String "readingTime"]

Kevin Peters

Compile once, run anywhere with WebAssembly and WASI

WebAssembly was initially created to bring languages other than JavaScript to the browser. Its design goals include portability, safety and performance. WASI (WebAssembly System Interface) lifts those capabilities to the world outside the browser. This...

Go
Java

3.2.2023 | 10 [Missing String "readingTime"]

Microstream – the end of O/R mappers?

Searching for alternatives to O/R mappers and persistence frameworks for NoSQL databases, I came across Microstream and was interested pretty quickly. On the one hand because Microstream is being developed in my home region Oberpfalz, but mainly because...

Java
Database
Software architecture

29.9.2022 | 14 [Missing String "readingTime"]

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Heroku is cancelling their free plan! What about all my open-source projects? Luckily fly.io comes to the rescue! Here are the missing docs on how to run Spring Boot on fly.io.Why I love(d) HerokuHeroku was my go-to PaaS for open-source projects for ...

CI/CD
Java
Cloud
DevOps
Spring

18.9.2022 | 17 [Missing String "readingTime"]

Keycloak.X, but secure – without vulnerable libraries

TLDR: How to reduce the known CVEs (common vulnerabilities and exposures) to zero by creating your own Keycloak distribution* .IntroductionKeycloak (see website) will become easier and more robust by switching to Quarkus, at least that’s the promise...

Java
IT-Security
Keycloak

9.5.2022 | 11 [Missing String "readingTime"]

Migrating a Spring Boot application to Java 17 – the hard way: Day 2

Welcome back to my article on migrating a Spring Boot application to Java 17 – the hard way.On day 1 we:tried using Java 17 with our Spring Boot 2.3.3.RELEASE, didn’t workupgraded Lombok and MapStructcouldn’t upgrade ASM, since Spring repackages ASMupgraded...

Java
Spring

22.12.2021 | 18 [Missing String "readingTime"]

Migrating a Spring Boot application to Java 17 – the hard way

Java 17 has recently been released, and I’m excited for the many improvements and new features. Instead of starting from a new or recent project (where’s the excitement in that?), we’re going to update an existing Spring Boot application until we can...

Java
Spring

14.12.2021 | 11 [Missing String "readingTime"]

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 [Missing String "readingTime"]

JavaScript test performance: getting the best out of Jest

In recent years Jest has established itself as the go-to testing framework for JavaScript and TypeScript development. It provides a complete toolkit (test runner, assertion library, mocking library, code coverage and more) out of the box, and requires...

Node.js
JavaScript
APM
Testing

12.11.2021 | 7 [Missing String "readingTime"]

Getting efficient with code and IDEs

Have you ever wondered why people use their Integrated Development Environment (IDE) in other ways than you do? Noticed people being dramatically slower or faster than you while programming? Remember that person using their mouse for every little action...

Software development
Java

6.10.2021 | 11 [Missing String "readingTime"]

Axon Framework 102: Dealing with personal data

Welcome to Axon Framework 102, where we will be deep diving into many interesting challenges you will encounter when working with Axon Framework. We will be diving into asynchronous projections and letting the front-end know new data. We will take a ...

DDD
Framework
Pattern
Event Sourcing
Compliance
Java
Kotlin

10.5.2021 | 5 [Missing String "readingTime"]

The how of monitoring your services

Lately, there has been a lot of discussion about SLAs, SLOs and SLIs. As this article states, it is hard to define the correct SLOs and SLIs. This discussion is about what part of your services you want to monitor. But it is also difficult to measure...

Infrastructure
APM

17.11.2020 | 5 [Missing String "readingTime"]

Rust for Java developers

Rust for Java developers – A step-by-step introductionThe Java ecosystem is vast and can solve almost any problem you throw at it. Yet its age shows in several parts, making it clunky and unattractive to some Java devs – devs that may be interested in...

Software development
Java
Rust

9.9.2020 | 36 [Missing String "readingTime"]

Elisabeth Schulz

Creating integration flows with the Reedelk Data Integration Platform

The integration of data from systems of record or legacy systems is one of the elements of a software development project that does not start on a greenfield. In other words, it can help modernize software. Usually the question arises how to transfer...

Agile transformation
Container
Software architecture
Java
Microservices
Open Source
API

3.9.2020 | 8 [Missing String "readingTime"]

Daniel Kocot

Performance optimization of a GraphQL app with Instana

“Works on my machine.” Okay, but we know quite well software never behaves the same when running on different machines… We knew that, but ran into unexpected performance issues when going live with a simple app. Here’s how we fixed the problem and improved...

Cloud
APM
API
JavaScript

21.7.2020 | 8 [Missing String "readingTime"]

Simplifying Spring Boot GraalVM Native Image builds with the native-image...

The new spring-graalvm-native 0.7.1 & GraalVM 20.1.0 releases are full of optimizations! The configuration of the native-image command has become much easier. So let’s take a look at the native-image-maven-plugin for our Spring Boot GraalVM Native Image...

CI/CD
Java
Spring

17.6.2020 | 10 [Missing String "readingTime"]