Performance optimization of a GraphQL app with Instana

21.7.2020 | 8 minutes reading time

“Works on my machine.” Okay, but we know quite well software never behaves the same when running on different machines… We knew that, but ran into unexpected performance issues when going live with a simple app. Here’s how we fixed the problem and improved performance.

This is about an existing GraphQL application www.coolboard.fun – a kanban board trello clone app. It ran terribly slow when going live, running into performance issues caused by a rate-limited backend.

After the root cause was found (see post from Robert Hostlowsky ) it’s ready to be optimized and we will see the improvements in the results.

Why did we not notice it earlier while developing? We were focused on delivering features, and when testing we were the only users . But with more users more problems emerged !

With appropriate monitoring we were able to find the bottlenecks caused by simple design flaws quickly.

As described in the previous blog post in detail, it was caused by a flawed design which was not visible while developing but easily found in production with Instana.

In this post we will describe how the load can easily be reduced by 50% and how the performance can greatly be improved.

We will remove a bottleneck in the API-Gateway:

Root cause

As mentioned in the previous post, our Gateway API always fires one additional GraphQL request for user data when the frontend fetches any data. You might already guess that this could somehow be related to authentication, right? And we will see, that is the right direction …

Inefficient Authorisation check

In GraphQL, the resolvers are the “worker” for fetching and providing any piece of data.
Each query field can have its own resolver method.
Each resolver method can implement a gate for checking authorisation: e.g. only an admin can “see” everything.
For each request there is a context which holds any specific info, e.g. all http request headers.
A context can also provide access to global services.

In our application, all the resolvers check JWT OAuth token and if a user with that auth-id exists in the database.

In our first implementation this helper function getUserId() checks the authorization:

const getUserId = async (context) => {

   // 1. verify the authentication token (stored in `context`) and retrieve the authenticationID,
  const authenticationID = await retrieveAuthHeaderToken(context)  // 1

  if (authenticationID) {
  
    // 2. lookup the user with this authenticationID,
    const user = await context.db.query.user({where: { authenticationID }}) // 2
    if (user) {

      // 3. retrieve the user's database Id
      return user.id // 3
    }
  }
  
  throw new AuthError()
}

And this is how we used this helper in our GraphQL resolvers:

// resolvers/query.js

export const resolvers = {

  // get currently logged-in user
  async currentUser(parent, args, context, info) {
  
    const id = await getUserId(context)    // 1
    return context.db.query.user({ where: { id } }, info)
  },

  // get any specific board
  async board(parent, boardId, context, info) {
    await getUserId(context)    // 2
    
    return context.db.query.board({ where: { id: boardId } }, info)
  },

  async cardlist(parent, { where }, context, info) {
    await getUserId(context)    // 2
    
    return context.db.query.list({ where }, info)
  }
}

At first sight, this implementation seems to be correct. It is blocking any non-authenticated access.

“It works”… “Done!”… “Wait?!?”…

Can you spot the mistake?

The currentUser() retrieves the user’s id, and loads the user from the database a second time.
Without any need for the user's id – why do we look up the user in the database? This was not required at all.

Authentication-check improved

We extract the functionality to only verify that the OAuth token from http-header is valid: ensureAuth0TokenValid()

Then we do the user lookup directly in the resolver itself, after extracting the authentication ID (part of OAuth token). The adapted resolvers are now:

// resolvers/query.js

export const Query = {
  async currentUser(parent, args, context, info) {
    // checks token from request header, and extracts oauth-id
    const authenticationID = await retrieveAuthHeaderToken(context)
    return await context.db.query.user({where: { authenticationID }})    
  },
  
  async board(parent, boardId, context, info) {
    // checks token from request header
    await ensureAuth0TokenValid(context)
    return ctx.db.query.board({ where: { id: boardId } }, info)
  },

  async cardlist(parent, { where }, context, info) {
    // checks token from request header
    await ensureAuth0TokenValid(context)    
    return ctx.db.query.list({ where }, info)
  }
}

There is another possible simple optimization because the relation of authentication-id to user-id does not change at all.

We can hold that information in a lookup table, but need to load the info once in the lifecycle of the server – so it is less ideal with serverless lambdas.

// server.js  
const userIdByAuthIdLookup = {};  

export const lookupUserWithAuthId = async (authenticationID) => {
    return userIdByAuthIdLookup[authenticationID] ?? 
        (userIdByAuthIdLookup[authenticationID] = await db.query.user({where: { authenticationID }}))
 }

This simplifies the resolver:

  // ...
  async currentUser(parent, args, context, info) {
    const authenticationID = await retrieveAuthHeaderToken(context)
    return lookupUserWithAuthId(authenticationID)
  }

As a side effect this can also be used to optimize our GraphQL mutations which are using the user’s id, too:

// mutations.js
export const Mutations = {
    async createBoard(parent, { name }, context, info) {
        const authenticationID = await retrieveAuthHeaderToken(context)
        
        const userId = context.lookupUserWithAuthId(authenticationID).id
        
        return ctx.db.mutation.createBoard({ data: { name, createdBy: userId } }, info)
    }
}

Summary:

We replaced the authentication verification with just the OAuth token verification for query operations.
We removed an unneeded database access for retrieving the data of the currently logged-in user.
We are now caching the result of the User lookup.

Verification

Let’s run this simple reproducible scenario:

We will trigger the opening of the board page in the browser 10 times with a 2 seconds delay. This will create “enough load”: the loading will need up to one minute to be finished.

#!/bin/bash 
for i in {1..10} ; do \
   open https://localhost:3000/board/ck5sc7nis74vk0901gvvr42hi ; \
   sleep 2 ; \
done

Then we will wait one minute to get out of the rate-limiting time slot, and repeat it. After repeating this once again, we can ensure to get more solid stats:

openBoardPage_10times
sleep 60
openBoardPage_10times
sleep 60
openBoardPage_10times

This generates 3 sections we will see in the charts in our results below.

Expected result and comparison

After logging in once into the browser, we are staying authenticated for the testing. So, effectively, our frontend opens 3 times 10 board pages.

Our frontend will send these GraphQL requests to the API gateway server:

30 board queries
30 current-user queries
150 (=30*5) cards-list queries.

Our API gateway will send requests to the GraphQL backend

-> Before optimization this lead to 420 calls in summary:

30 user queries + (30 user queries for auth check)
30 board queries + (30 user-queries for auth check)
150 card-list queries + (150 user-queries for auth check)

-> After optimising it results in only 180 calls

We can see the reduced number of calls because all responses arrive in a shorter time see (A).

With fewer requests, the number of waiting requests caused by rate-limiting is smaller, and the latency goes down (B).

At the end, we saved ca. 50% load on our backend, by reducing the number of requests from 420 to 180 !

Latencies

After optimization, the requests sent from the browser have less latency compared with before.

Here, Instana gives us more interesting insights:

The (GraphQL) requests sent from our website to the API-gateway show how the response times for loading the data on the page goes down by factor 3, overall all pages get loaded in a shorter time.

Page load times for GraphQL requests

And finally the improved page load times reflect the improvement:

Here we see the result of our performance optimization: much smaller retrieval times!

Looking forward

The performance improvement shown was just some simple optimization to reduce the bottleneck.
We could try to do more optimization: e.g. why not share the userid to the client and send it together with auth-header? That will at the end only save us one extra lookup for some GraphQL operations, but will make the app unsecure, because sharing internal data. (Don’t do that!)
Finally, the natural performance limit, defined by the current backend, is reached.

Our learnings let us think about alternatives more wisely, for example:

The quick solution: Cut the limit by paying more for Prisma Cloud service.
We could host the Prisma database and run the Prisma server on our own.
Affects also the UI frontend: We could also change the API to allow loading the whole board with only one GraphQL request.

Result, outcome and impact

We found that, once we understood the issue, improving the performance was not difficult. While the problem was hidden in development, it got noticeable in production, caused by a third-party system.

Such small issues can easily slip through while designing and implementation.
We used appropriate monitoring and tooling to easily locate it (part 1 ).
We verified the results or our optimization.

Even though this was only a small project, you can imagine how difficult troubleshooting can get on a bigger system or more distributed systems!

When you are interested in more details and even want to try Instana, you can run a full-featured 14-days trial version .

There is also this post about how to install Instana on a kubernetes cluster (German). You could also request a demo or run a proof of concept together with the APM team .

Was this post helpful?

Blog author

Maximilian Mayer

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Serverless from Europe: My Experience with Scaleway as an Alternative ...

In addition to dominant US providers like AWS, Azure, and GCP, the French company Scaleway now offers a comprehensive serverless computing portfolio. This includes services for Function as a Service, a lightweight Key/Value Store, and a simple messaging...

Compliance
Infrastructure
data protection
Cloud native
Cloud
Infrastructure as Code

28.5.2025 | 5 minutes reading time

Florian Lüdiger

API Thinking

In many projects, teams create technically functional APIs without focusing on business requirements or future use. Development often still follows a classic pattern: assign a ticket, implement the interface, and close the ticket. What's missing is the...

29.4.2025 | 5 minutes reading time

Miriam Greis

Daniel Kocot

When your API platform lacks the desired impact

Many companies have high hopes for API platforms, expecting them to facilitate integrations, promote reuse and future-proof the company technologically. Initially, a lot of things seem to go well: the platform is established, the first projects have ...

Integration
API

12.2.2025 | 6 minutes reading time

Miriam Greis

Daniel Kocot

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 minutes reading time

Daniel Kocot

Miriam Greis

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 minutes reading time

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 minutes reading time

Roger Butenuth

Nils Winking

Enterprise Integration Patterns Reloaded Part 1

Part 1: The Power of Patterns – Why Enterprise Integration Matters More Than Ever The idea for this blog series comes from an observation I’ve made repeatedly over the years. During workshops, talks, or training sessions, whenever I bring up Enterprise...

Integration
API

8.1.2025 | 6 minutes reading time

Daniel Kocot

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 minutes reading time

Daniel Kocot

The Ultimate Tool for Engineers and Developers: Compass Premium

It’s not an every day activity that a tool comes and redefines how engineering and development teams operate, but Compass is the tool with a game-changing solution. As Atlassian's out-of-the-box internal developer platform, Compass helps teams to stay...

Atlassian
Cloud

3.12.2024 | 4 minutes reading time

Özge Kavas

Living on the edge: building serverless applications with Cloudflare Workers

Cloudflare is best known for its CDN, DNS server (1.1.1.1) or WAF/DDos mitigation services. These services are highly predicated on “Edge Computing”, bringing data closer to the user interested in those services – a user in Australia will be happier ...

Cloud native
Cloud
Serverless

28.11.2024 | 14 minutes reading time

Charge your APIs Volume 35 - Going Beyond OpenAPI: Using API Value Proposition...

APIs have become the backbone of modern digital transformation, connecting systems, automating processes, and enabling innovative customer experiences. However, their potential often remains underutilised due to a persistent gap between technical descriptions...

20.11.2024 | 6 minutes reading time

Daniel Kocot

Charge your APIs Volume 34 - From Christian Posta’s Omni-Directional API...

As companies expand their digital ecosystems with APIs at the core, managing these interfaces with flexibility and governance has become essential. Traditional centralised API management models struggle to keep pace with decentralised, microservices-...

13.11.2024 | 6 minutes reading time

Daniel Kocot

We deployed our SaaS Application on fly.io (and it was great).

How we deployed our application in a fraction of the time while saving 100% of the cost. Our team, a bunch of experienced software engineers without prior contact to cloud deployments, wanted to deploy our OCPP-compliant EV Charging Station Simulator...

AWS
Cloud

23.10.2024 | 4 minutes reading time

Jannis Mainczyk

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the third and last one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first and second article)The previous articles focused on (i) Microcks’ ...

Testing
API

23.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Key TakeawaysThis article is the second one in a three-part series about definition-based API mocking, simulation, and testing with Microcks (make sure you have read the first article)While the previous article concentrated on Microcks’ architecture,...

API
Testing

16.10.2024 | 11 minutes reading time

Dr. Florian Rademacher

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Key TakeawaysAPI mocking used, e.g., for integration testing, is challenging as it assumes conformance to mocked API functionality, which can incur significant costs as mock complexity increases with API complexityDefinition-based API mocking can reduce...

API
Testing

9.10.2024 | 9 minutes reading time

Dr. Florian Rademacher

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 minutes reading time

Markus Höfer

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

API gateways are essential for managing and securing data flow between services. As software architectures evolve, different types of API gateways have emerged to address specific challenges: Legacy, Agnostic, and Kubernetes-native. Drawing on insights...

API
Software architecture
Infrastructure
Integration

21.8.2024 | 12 minutes reading time

Daniel Kocot

Charge your APIs Volume 29: API enabling as a factor for success

An Enabling Team is one of the four team types of the Team Topologies framework, as introduced by Matthew Skelton and Manuel Pais. How can this pattern be successfully applied to the design and development of APIs to create interfaces that really contribute...

9.8.2024 | 9 minutes reading time

Miriam Greis

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 minutes reading time

Performance optimization of a GraphQL app with Instana

Root cause

Inefficient Authorisation check

Authentication-check improved

Summary:

Verification

Expected result and comparison

Latencies

Page load times for GraphQL requests

Looking forward

Result, outcome and impact

Was this post helpful?

Blog author

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Serverless from Europe: My Experience with Scaleway as an Alternative ...

API Thinking

When your API platform lacks the desired impact

Introducing Data Interface Quadrants (DIQs)

Spring and Vue - A setup for small projects (Part 2)

Spring and Vue - A setup for small projects (Part 1)

Enterprise Integration Patterns Reloaded Part 1

Charge your APIs Volume 36 - Trends for 2025

The Ultimate Tool for Engineers and Developers: Compass Premium

Living on the edge: building serverless applications with Cloudflare Workers

Charge your APIs Volume 35 - Going Beyond OpenAPI: Using API Value Proposition...

Charge your APIs Volume 34 - From Christian Posta’s Omni-Directional API...

We deployed our SaaS Application on fly.io (and it was great).

Charge your APIs Volume 33 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 32 - Definition-Based API Mocking, Simulation,...

Charge your APIs Volume 31 - Definition-Based API Mocking, Simulation,...

Dangling DNS in cloud infrastructures

Charge your APIs Volume 30 - Gateway to Success: Understanding and Choosing...

Charge your APIs Volume 29: API enabling as a factor for success

Spring Boot and HTMX: Deployment to AWS Lambda