Performance Analysis of a GraphQL application with Instana

6.3.2020 | 9 minutes reading time

Modern IT landscapes typically consist of a bunch of different microservices. Replacing the monoliths brings us more complexity due to more parts and all their dependencies.

A key aspect for running these systems is the appropriate monitoring with the ability to handle this complexity and to observe system performance. It also needs to understand all different communication forms like REST, gRPC, GraphQL, etc.

In this post we will analyse the performance issues of an existing application. In a follow-up blog post we will then present the solution and the resulting performance improvement.

A demo application (www.coolboard.fun ) for a GraphQL online course was quickly implemented, set up and running, but ran into performance issues… Screenshot coolboard.fun Web app

With more and more users, the performance went down faster than expected, resulting in:

Page load time > 1 second: even a page took up to 15 seconds!
Failing end-to-end browser tests after running into timeouts!

While I was developing the app, I never ran into such issues, so I started wondering

What was the main bottleneck? Spoiler alert: there is a subtle side effect caused by rate-limiting.
Is there any easy way to fix most (the typical 80%) performance issues quickly?
Is there at least some low-hanging fruits, just because sometimes I tend to be lazy?
And important in the long-term: can we get insights to make the right decision for changing architecture and building blocks later?

Before searching the root issue, we will need to understand the overall structure of the application, and the building blocks of different (micro)services, their dependencies and how they communicate.

High-level architecture and services

Web (SPA) -> API Server(BFF, Auth) -> Prisma Server(GraphQL – ORM mapping) -> DB

The Single-page application (SPA) is running in the browser and connects to Auth0.com for authentication and accesses the API Server which provides a specific GraphQL API interface and does authentication handling (aka: backend-for-frontend). It can even be scaled up easily because it does not do session handling there. The authentication is only done by exchanging JWT auth tokens.
The user management and authentication is done via the separate third-party service, Auth0.com .

The Prisma Server is an ORM and it provides all usual CRUD operations via GraphQL operations.

Observation

When I open one board page with only a small number of lists of cards (aka: lanes), then the page loads fast.

But when more pages are opened simultaneously or when there is more load, the performance drops and the page seems to be loading slow-ish!

At least the board’s title with its list names seems to appear quickly because they get loaded first.

loading board page - first part

Then, every list of cards gets loaded and the lanes filled with its cards. Under some load the response times can increase to more than 15 seconds – catastrophical!

board page has been fully loaded

When you are wondering why the application reacts in this way, we need to mention that

the purpose of this application was demonstration of the use of GraphQL,
developing and testing the application was done on a local dev machine within a docker environment,
there was neither much load nor extra load testing until the app going live, so it was not noticed before, and
finally, the deployment to a cloud service was not obviously leading to such a bad performance.

Now, I have some suspicions because I am using the free but limited version of Prisma cloud: The communication between my API-gateway and the Prisma cloud server is somehow throttled and limited (more details later).

But let’s start figuring out how the services communicate with each other by the help of some tools.

Analysis – Apollo Graph Manager

The easiest way to get some metrics was by activating the built-in tracing-feature for sending query metrics in the Apollo-server: After creating an account and api-key on Apollo Graph Manager at https://engine.apollographql.com we can activate tracing in the API-gateway. Furthermore, we just need to turn it on by wrapping the server in our API-gateway:

Extend server to send metrics to apollo graph manager

Every request from the Prisma cloud backend by our API-gateway gets logged – after removing any parameters values.

This will give us some insight on the communication between the website in the browser to the API-gateway.Even while the free version has timely limited logging of only the last 24 hours, it already shows us that we run more than 200 queries, while opening the board page 29 times:

The response-time of the CardList query is distributed between 400 milliseconds and 14 seconds!

Finding 1: There are too many GraphQL requests triggered

One root cause may be the limitation or throttling of our free GraphCool/Prisma cloud server:

Until now, we only get the metrics for GraphQL-requests sent from the browser to the API-gateway.

We will need to dive deeper now. We need to inspect the communication between the API-gateway and Prisma cloud GraphQL server, in order to understand which queries are slow or where the bottleneck is.

Analysis – APM with tracing

At this point I was looking for an application monitoring tool which is capable of understanding GraphQL. That means which is able to understand and differentiate the GraphQL queries which are all sent as usual POST requests to the same endpoint (e.g. /graphql)

I checked well-known tools on the market, but actually there was only InstanaTM capable of tracing this GraphQL protocol communication.
Instana™️ also provides end-user-monitoring (EUM) together with tracing the communication of microservices down to database operations, which is an ideal tool for our use case.

We will use Instana™️ running as a SaaS version. Additionally, we will need to run the Instana™️ agent in the same environment as our services. The agent sends the recorded monitoring data to the Instana™️ backend.
For this demo I can also start it in a local docker environment and start the API-gateway there.
Compared to the production environment we will get different timings, but that is okay, as we just want to focus on the communication flow for now.

The setup and high-level architecture for our further analysis:

Let’s start with enabling end-user-monitoring:
We will need to define a website in Instana™️, and add this snippet into webpage similar to embedding e.g. Google Analytics:

Everything will work automatically out of the box, we only need to add setting the name of the page, via injecting this javascript call at the end of the webpage:
ineum('page', 'main-page')

In order to get full tracing and monitoring in our API-gateway running on Node.js, we only need to run these lines before anything else. This activates code injection, so all requests and responses will get traced automatically!

Let’s start from the user’s perspective:

Instana™️ provides a “website view” where we can see how our boards page with all its resources gets loaded. After filtering for XHR / Post requests, we already see the necessary requests for boards data:

One request, getting board’s name and its lanes’ titles only
Some extra request for each lane (=card list)

Although this looks pretty fine (load time below 1 second), the performance gets worse when more users load the board page. We can see that after clicking that button to open the Analytics page to show the backend traces.

We can see all specific XHR requests to the API-gateway (at localhost:4000) with the different, varying response times (in the last column):

We need to dive deeper into one of these traces to figure out how it communicates to the Prisma cloud backend.
First, when filtering for all calls, we can see that varying response times in the right column again.

That already gives some indication for our issues!

Then, let’s see what happens in the background by selecting one call. This gives some information and shed some light on the communication of API-gateway and the Prisma cloud backend:

Traces for the browser requesting the initial board metadata:

We find two sequential requests to the Prisma backend, called by the API-gateway one after the another:

First, it is requesting some user information.
In the second request it retrieves the board data from the backend. (In the image above it is selected, so we see the query details on the right side)

Traces for the browser requesting one lane (=card list) with its cards:

Here, we also find an extra request – for some user data – (see the details on the right side)!

Finally, even while there are only 6 GraphQL requests by the frontend, we will end up in more than 12 backend calls to Prisma backend!

Finding 2: There are unneeded extra requests by the API-gateway

In the analysis above, we found out that the API-gateway is requesting some unneeded and unexpected extra user data from the database backend (at eu1.prisma.sh), doubling the number of requests.

Quickly running into the rate-limiting causes the varying latency…

How can we solve this?

We quickly found a performance bottleneck and what is causing that problem: The main goal will be to reduce the overall number of GraphQL requests at the backend.

Obviously, even while the architecture and service structure were fully sufficient for a little demo, the best solution is to migrate to a less limited GraphQL persistence service (e.g. FaunaDB) or hosting a Prisma backend service on our own.

To fix the performance issues, we could even add caching in the API-gateway or collapse all GraphQL queries into one huge GraphQL query, but this means adapting the application.
Low-hanging fruits: As a quick measure we should get rid of fetching extra user information in each request by adapting our API-gateway server!

Conclusion

In order to find performance issues it is necessary to have the right tools: not only to monitor performance but also to analyse it easily and find problems quickly.
The Apollo Engine helped to get some quick statistics first, but will be limited to GraphQL specific operations only.
Additionally with Instana™️ we get a bigger detailed picture and we can also find the bottleneck in the communication – via GraphQL and other protocols – of the the whole system.

For this post we used Instana™️ for the detection of the performance issues with only a limited view of only a part of the system. You can imagine how effective this can be when used within the whole production system, monitoring all parts of the whole system, and when you also can use its advanced alerting features!

When you are interested in more details and even want to try Instana™️, you can run a full-featured 14-days trial version. There is also this post about how to install instana on a kubernetes cluster (German).
You could also request a demo or run a PoC together with the APM team .

As we now have an idea what the root problem is, we can improve the performance by reducing the load on backend by 50% with only little effort. Check out part two: Performance Optimization of a GraphQL app with Instana .

Was this post helpful?

Blog author

Robert Hostlowsky

Do you still have questions? Just send me a message.

fromRobert Hostlowsky

Let’s build a Spotify GraphQL Server – Part 1

Update 1: Hint about blocked non-authenticated REST calls by Spotify. The demo and source code on github are already adapted, more details in a follow-up blog.improved code syntax-highlightingGitHub built a GraphQL API server. You can write your own...

API
JavaScript
Node.js

20.9.2017 | 11 minutes reading time

Robert Hostlowsky

SoCraTes2013

The SoCraTes2013, Software Craftsmanship and Testing #3 , took place on 1st to 4th August in Seminarzentrum Rückersbach nearby Aschaffenburg (@socrates_2013) This year codecentric sponsored this conference, because we see craftmanship as one of our ...

Software development
Testing

8.8.2013 | 2 minutes reading time

Robert Hostlowsky

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

API Thinking

In many projects, teams create technically functional APIs without focusing on business requirements or future use. Development often still follows a classic pattern: assign a ticket, implement the interface, and close the ticket. What's missing is the...

29.4.2025 | 5 [Missing String "readingTime"]

Miriam Greis

Daniel Kocot

When your API platform lacks the desired impact

Many companies have high hopes for API platforms, expecting them to facilitate integrations, promote reuse and future-proof the company technologically. Initially, a lot of things seem to go well: the platform is established, the first projects have ...

Integration
API

12.2.2025 | 6 [Missing String "readingTime"]

Miriam Greis

Daniel Kocot

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 [Missing String "readingTime"]

Daniel Kocot

Miriam Greis

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 [Missing String "readingTime"]

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 [Missing String "readingTime"]

Roger Butenuth

Nils Winking

Enterprise Integration Patterns Reloaded Part 1

Part 1: The Power of Patterns – Why Enterprise Integration Matters More Than Ever The idea for this blog series comes from an observation I’ve made repeatedly over the years. During workshops, talks, or training sessions, whenever I bring up Enterprise...

Integration
API

8.1.2025 | 6 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 35 - Going Beyond OpenAPI: Using API Value Proposition...

APIs have become the backbone of modern digital transformation, connecting systems, automating processes, and enabling innovative customer experiences. However, their potential often remains underutilised due to a persistent gap between technical descriptions...

20.11.2024 | 6 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 34 - From Christian Posta’s Omni-Directional API...

As companies expand their digital ecosystems with APIs at the core, managing these interfaces with flexibility and governance has become essential. Traditional centralised API management models struggle to keep pace with decentralised, microservices-...

13.11.2024 | 6 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the third and last one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first and second article)The previous articles focused on (i) Microcks’ ...

Testing
API

23.10.2024 | 11 [Missing String "readingTime"]

Dr. Florian Rademacher

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the second one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first article)While the previous article concentrated on Microcks’ architecture,...

API
Testing

16.10.2024 | 11 [Missing String "readingTime"]

Dr. Florian Rademacher

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Key TakeawaysAPI mocking used, e.g., for integration testing, is challenging as it assumes conformance to mocked API functionality, which can incur significant costs as mock complexity increases with API complexityDefinition-based API mocking can reduce...

API
Testing

9.10.2024 | 9 [Missing String "readingTime"]

Dr. Florian Rademacher

Using External Secrets with Crossplane & ArgoCD

Most Crossplane providers need to authenticate themself against Cloud infrastructure providers. But how do we store these Secrets in a GitOps fashion? If external secret stores are a great way of doing this: How do we successfully integrate them with...

Infrastructure as Code
Platform engineering
DevOps
Cloud native

30.9.2024 | 15 [Missing String "readingTime"]

Going full GitOps with Crossplane & ArgoCD

In the last post we already deployed Crossplane with ArgoCD in a GitOps-fashion. But what about Crossplane providers and their configuration? And can't we optimize the boostrapping with the ArgoCD App-of-Apps pattern? We can! And we'll also provision...

Cloud native
Platform engineering
DevOps
Infrastructure as Code

9.9.2024 | 13 [Missing String "readingTime"]

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 [Missing String "readingTime"]

Markus Höfer

Bootstrapping Crossplane with ArgoCD

After going into detail about why the integration of Crossplane and ArgoCD is a great way to unlock a new level of GitOps, I promised to dive into the details of such a setup. Here we are! Let's have a look at the basic steps how to use Crossplane together...

Infrastructure as Code
Platform engineering
DevOps
Cloud native

2.9.2024 | 11 [Missing String "readingTime"]

From Classic CI/CD to GitOps with ArgoCD & Crossplane

Lately I found a passion in integrating Crossplane with ArgoCD and finally wanted to write about all the steps needed to create a full blown working setup of both. Just as I finished the code and tried to find a good start into the topic, I found that...

DevOps
Platform engineering
Cloud native
Infrastructure as Code

27.8.2024 | 8 [Missing String "readingTime"]

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

API gateways are essential for managing and securing data flow between services. As software architectures evolve, different types of API gateways have emerged to address specific challenges: Legacy, Agnostic, and Kubernetes-native. Drawing on insights...

API
Software architecture
Infrastructure
Integration

21.8.2024 | 12 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs Volume 29: API enabling as a factor for success

An Enabling Team is one of the four team types of the Team Topologies framework, as introduced by Matthew Skelton and Manuel Pais. How can this pattern be successfully applied to the design and development of APIs to create interfaces that really contribute...

9.8.2024 | 9 [Missing String "readingTime"]

Miriam Greis

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...

A few weeks ago, when we introduced Dapr, we also discussed its overlapping capabilities with a service mesh, although Dapr itself is not a service mesh. As already mentioned in a previous blogpost, in recent years service meshes have become a pivotal...

Networking
Microservices
Kubernetes
Cloud native

1.8.2024 | 16 [Missing String "readingTime"]

Manuel Zapf

Performance Analysis of a GraphQL application with Instana

High-level architecture and services

Observation

Analysis – Apollo Graph Manager

Finding 1: There are too many GraphQL requests triggered

Analysis – APM with tracing

The setup and high-level architecture for our further analysis:

Traces for the browser requesting the initial board metadata:

Traces for the browser requesting one lane (=card list) with its cards:

Finding 2: There are unneeded extra requests by the API-gateway

How can we solve this?

Conclusion

Was this post helpful?

Blog author

More articles

Let’s build a Spotify GraphQL Server – Part 1

SoCraTes2013

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

API Thinking

When your API platform lacks the desired impact

Introducing Data Interface Quadrants (DIQs)

Spring and Vue - A setup for small projects (Part 2)

Spring and Vue - A setup for small projects (Part 1)

Enterprise Integration Patterns Reloaded Part 1

Charge your APIs Volume 36 - Trends for 2025

Charge your APIs Volume 35 - Going Beyond OpenAPI: Using API Value Proposition...

Charge your APIs Volume 34 - From Christian Posta’s Omni-Directional API...

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Using External Secrets with Crossplane & ArgoCD

Going full GitOps with Crossplane & ArgoCD

Dangling DNS in cloud infrastructures

Bootstrapping Crossplane with ArgoCD

From Classic CI/CD to GitOps with ArgoCD & Crossplane

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

Charge your APIs Volume 29: API enabling as a factor for success

Integrating Dapr with Cilium: A Sidecar-Less Service Mesh Approach combined...