Caching de luxe with Spring and Guava

14.3.2016 | 13 minutes reading time

Summary

We generally don’t optimize expensive operations in code until they create a bottleneck. In some of these cases you could benefit a lot from caching such data. The Spring solution is non-intrusive, highly configurable yet easy to set up, and fully testable. But if your business domain is a bad fit, caching can do more harm than good. Rather than delve into the technical implementations details this post explores the functional ramifications of caching with some practical examples, available in a demo application on github: https://github.com/jaspersprengers/caching-demo.git

If you’re an experienced developer I assume you are familiar with the concept of caching. There are plenty of tutorials on the Spring caching framework, but to my taste they dive too quickly into the nitty-gritty of configuration without first distinguishing the good use cases from the less ideal candicates. Such decisions have everything to do with your system’s business rules. I will present three concrete and very different examples that at first glance are not ideal candidates for caching, but can still benefit from it if properly configured. Then we will look at ways to properly test a richly configured cache implementation. I deliberately leave out the finer details of advanced configuration. You can read all about them in the official Spring docs .

Make a spoonful of broth. Fifteen times.

Sometimes you have to take radical measures to convince your colleagues why some technology is useful and fun, so please bear with me when I start you off with a culinary analogy.

If you take your cooking seriously you’ll keep your pots, implements and jars (no, not jar files) within easy reach, especially when you are going to use them often. You don’t run back and forth to the cupboard – much less open and close it – every time you need to add a pinch of salt, do you now? To stretch the argument to breaking point: when you need to add a spoonful of broth every five minutes to your softly boiling risotto, do you boil a spoonful of water, make the broth, add it to the rice, clean the pan, put it away, and repeat this process fifteen times? Or do you prepare half a liter of broth before boiling the rice? A rhetorical question if ever these was one, yet this is exactly how we write our code most of the time: with repeated calls to relatively expensive operations that return exactly the same broth every time. All because we think in seconds instead of nanoseconds.

Crossing an A4 sheet at light speed

We are extravagantly wasteful with computer time because human consciousness operates in seconds, a pace many orders of magnitude slower than that of computers. Computers work in nanoseconds, which is hardly time at all. A nanosecond is a billion times faster than a second. It is to a second as a second is to thirty years. Light travels the length of an A4 sheet within a nanosecond. Got that?

Usability research shows that any response under 0.1 seconds (100 million nanoseconds) is perceived as instantaneous. We can’t tell if a web page returns in 10 microseconds or 50 milliseconds, and so notice no improvement. That’s how slow we are, even when sober. I recently started caching the results of a common database query and even without network IO the performance increase was more than twentyfold:

1Local fetch from cassandra database: 2100 microseconds
2  Fetching from Guava cache:           78 microseconds

The figures are naturally much worse with a networked database (that is everywhere but in development) making the case for caching even greater. To make it visual:

Caching takes 78 microseconds, expressed in a 8 point font, whereas a database fetch takes (drum roll) a whopping…

2100

In kitchen terms it’s having the pepper within reach (78 centimeter) or having to fetch it from the garden shed.

It’s tempting to ignore performance penalties just because you don’t notice them. It’s also tempting to over-use caching once you get a taste for it. The smart aleck who keeps insisting that premature optimization is the root of all evil has a point. So let’s look at sensible and not so sensible use cases for caching.

The use case from heaven

A little refresher: a cache sits between a source (database/webservice) and a client and builds a lookup table (usually hashmap) of unique keys and values, standing for the distinct input to the source and the return value. When the source is queried again with the exact same input, the cache intervenes and returns the saved value instead. Any non-void method could be enhanced by caching, but the ideal candidate would be a method that:

behaves like a pure function: input A always returns B without side effects so cached entries never go stale.
accepts a limited range of inputs (for example an enumeration of all countries), so the cache can never grow beyond the number of entries in that enumeration.
is expensive to execute in terms of resources or duration and thus makes it worthwhile to cache in the first place.
is queried often with an even distribution of arguments, so every cached entry is retrieved regularly and evenly.

To cache or not to cache

Real-world use cases are probably nothing like this. You typically cache calls to databases or web services whose return values have a use-by date and therefore should not live indefinitely in the cache. There must be an eviction policy for stale entries. When designing a cache you must know how often the source data is likely to change and – more importantly – whether it’s acceptable to return stale data. This depends on the type of data and who uses it. Accurate readings of physial phenomena change continuously, but if the increments are small it may be acceptable to cache up to a few minutes and return stale data.

Some operations never return stale data but maybe they allow a wide range of input, leading to a bloated cache with ditto memory consumption. What if the input values are not evenly distributed? Then some cache entries occupy precious memory but are never queried and you end up with an in-memory copy of your database. That’s when you know you’re doing it wrong. The Spring tutorial gives an example of a books cache identified by ISBN number . Good as a tutorial but probably not something to implement for real, given the millions of possible ISBN numbers.

A temperature cache

Let’s say that the Dutch Meteorological Office has a hundred online weather stations accessible over a web API that return an accurate temperature reading expressed as a floating point: 18.75° C.

The readings of the thermometers change continuously, so the cache is always stale. Let’s say it is alright to return ten minute old readings. After that the entry should be evicted.
There are a hundred possible input arguments (the weather station’s ID) , so the cache size never exceeds that number. No problem there.

A postcode cache

The service that will access our new temperature cache expects a Dutch postcode and finds the weather station nearest to it. A single database table maps all valid postcodes to the nearest weather station and we want to cache those requests. What’t different about this case?

Postcode to weather station mappings never change, so the cache can never go stale. However…
Dutch postcodes are expressed as four digits and two capital letters, meaning there are roughly 6,7 million possibilities (9999 * 26 * 26). A disgruntled employee could write a script to try them all out and cause some real OutOfMemory discomfort. Clearly with such a big input range we don’t want the cache to become a memory hog. Let’s assume that a little log analysis has shown that really 95% of queries are for 2000 distinct postal codes. We can then safely set the maximum cache size to 2000 entries and evict those that have not been read for a day.
Most well-formed postal codes are not assigned to actual streets and therefore not in the database. The cache should be allowed to hold null values for these keys, so the database is not queried in vain for the same key, whether valid or not.

A stock exchange cache

The last example is a service that queries a remote API to cache the current price for a given share.
DISCLAIMER: I know nothing about financial markets. For example’s sake let’s assume prices changes no more frequent than every five minutes.

Stale values are not acceptable. A cached entry must be replaced as soon as the source changes.
The input range (number of different shares) is limited, so no size restriction is necessary.

Can I please see some code???

I know you’ve been itching for this:

1git clone https://github.com/jaspersprengers/caching-demo.git
2cd caching-demo
3mvn clean install
4cd target
5java -jar caching-demo-1.0-SNAPSHOT.jar

This will start up the Springboot demo application, which exposes two endpoints. Supply a valid four digit/two letter postcode for {postcode} (e.g. 1000AA) and for {share} one of AKZO, SHELL, ASML, UNILEVER, GOOGLE or FACEBOOK.

1http://localhost:8080/temperature/{postcode}
2  http://localhost:8080/share/{share}

Spring provides a caching abstraction and leaves the actual storage implementation to third party providers. The default implementation (backed by a concurrent hashmap) is only useful for vanilla flavoured Hello-World-Foobar situations. Luckily Spring provides adaptors for more powerful cache implementations, such as Guava Cache, which we will use here.
The CacheManager is a bean that manages our three caches (key/value maps) and needs to be set up as follows (see nl.jsprengers.caching.CacheConfig)

1@Bean
2    public CacheManager cacheManager() {
3        SimpleCacheManager simpleCacheManager = new SimpleCacheManager();
4        simpleCacheManager.setCaches(Arrays.asList(
5                buildPostCodeCache(),
6                buildTemperatureCache(),
7                buildSharesCache()
8        ));
9        return simpleCacheManager;
10    }

The following three private methods create and configure our Guava caches. Note how all configuration parameters can – and probably be should – be made configurable using @Value annotations. These values are set once during configuration, but there’s nothing to stop you from accessing the CacheManager elsewhere in your code to retrieve and reconfigure the caches at runtime, as we’ll see in the section on integration testing.

1@Value("${cache.postcode.maximum.size:1000}")
2    private int postcodeMaxSize;
3    private GuavaCache buildPostCodeCache() {
4        return new GuavaCache(POSTCODE_CACHE, CacheBuilder
5                .newBuilder()
6                .maximumSize(postcodeMaxSize)
7                .expireAfterAccess(1, TimeUnit.DAYS)
8                .build(),
9                true);
10    }

The postcode cache entries never go stale, but neither should you keep them around if nobody needs them, so after a day Guava should evict them. The size of the cache is limited to a configurable number using Spring’s property injection (default 1000). Tip: if you set the maximumSize to zero you effectively disable the cache, which can be useful in a test run without rebuilding the source.

1@Value("${cache.expire.temperature.seconds:600}")
2    private int expiryTemperatureSeconds;
3    private GuavaCache buildTemperatureCache() {
4        return new GuavaCache(TEMPERATURE_CACHE, CacheBuilder
5                .newBuilder()
6                .expireAfterWrite(expiryTemperatureSeconds, TimeUnit.SECONDS)
7                .build(),
8                false);
9    }

Entries in the temperature cache must be evicted after ten minutes so the service can get fresh values from the weather station. There’s no need to set a cap on the number of entries.

1private GuavaCache buildSharesCache() {
2        return new GuavaCache(SHARES_CACHE,
3                CacheBuilder.newBuilder().build(), false);
4    }

The shares cache is the easiest to configure, because eviction of stale entries is not managed by Guava.

The cached resources

Caching in TemperatureService and PostcodeService is very simple. There’s really nothing more to it than the Cacheable annotation with a reference to the cache name:

From TemperatureService:

1@Cacheable(CacheConfig.TEMPERATURE_CACHE)
2    public float getTemperatureForCoordinate(int coordinate) {
3        return weatherStation.getForCoordinate(coordinate);
4    }

From PostcodeService:

1@Cacheable(CacheConfig.POSTCODE_CACHE)
2    public PostCode getPostcode(String code) {
3        return postcodeDao.findByCode(code);
4    }

The SharesService take a bit more planning because it has to notify the cache whenever fresh information about share prices comes in. The external notification occurs by calling the setNewSharePrice method annotated with @CachePut. At first sight this method doesn’t seem to do much, but Spring uses the share parameter (identified by the key property) and the return value to update the cache entry. Another option would be a void method annotated with @CacheEvict, providing only the share name. This would kick out the entry, after which a call to getValue queries the exchange service and updates the cache. It depends on your setup which is the suitable option. @CachePut probably generates less network traffic.

1@Service
2public class SharesService {
3    private static Logger LOGGER = LoggerFactory.getLogger(SharesService.class);
4    @Autowired
5    StockExchange exchange;
6 
7    @CachePut(cacheNames = CacheConfig.STOCKS_CACHE, key = "#share")
8    public float setNewSharePrice(String share, float nextValue) {
9        LOGGER.info("Share {} was updated to {}", share, nextValue);
10        return nextValue;
11    }
12 
13    @Cacheable(CacheConfig.SHARES_CACHE)
14    public float getValue(String stockName) {
15        LOGGER.info("Fetching stock {} from exchange", stockName);
16        return exchange.getValue(stockName);
17    }
18}

Caching in action

You can see caching in action if you run the application with the application property cache.expire.temperature.seconds to a value of, say, 15 seconds.

1cache.expire.temperature.seconds=15

Here’s a little excerpt from the log when hitting the REST server with two different postal codes at varying intervals. Every call is logged by the Controller class, but PostcodeService and TemperatureService only log when the actual method body is accessed. If a log line is missing, that means the response came from the cache.

Postcode 1000AA not yet cached, station 10 not yet cached:

108:39:41.915 Controller : GET temperature for postcode 1000AA
208:39:41.923 PostcodeService : Getting postcode 1000AA from dbase
308:39:42.070 TemperatureService : Getting temperature from weather station 10

Postcode 1000AB not yet cached, station 10 still in cache

108:39:52.130 Controller : GET temperature for postcode 1000AB
208:39:52.130 PostcodeService : Getting postcode 1000AB from dbase

Postcode 2000AA not yet cached, station 20 still in cache

108:40:04.075 Controller : GET temperature for postcode 2000AA
208:40:04.075 PostcodeService : Getting postcode 2000AA from dbase
308:40:04.077 TemperatureService : Getting temperature from weather station 20

Postcode 2000AB not yet cached, station 20 has expired (>15 seconds since last call)

108:40:22.677 Controller : GET temperature for postcode 2000AB
208:40:22.677 PostcodeService : Getting postcode 2000AB from dbase
308:40:22.692 TemperatureService : Getting temperature from weather station 20

Postcode 2000AB in cache, station 20 has expired

108:40:45.786 Controller : GET temperature for postcode 2000AB
208:40:45.787 TemperatureService : Getting temperature from weather station 20

Postcode 2000AB in cache, station 20 still in cache

108:40:56.426 Controller : GET temperature for postcode 2000AB

Postcode 2000AB in cache, station 20 has expired

108:41:02.293 Controller : GET temperature for postcode 2000AB
208:41:02.294 TemperatureService : Getting temperature from weather station 20

But how do I test all this?

Blimey, in all the excitement we have completely forgotten to test all this cool stuff!

Modern frameworks like Spring Boot remove lots of tedious boilerplate at the price of making your annotation-sprinkled code less deterministic. In short: you cannot unit-test caching behaviour. The @Cacheable annotated methods only work inside the container, so a plain JUnit doesn’t cut it.

In a production environment you need to test all this. You must make sure that your cache does not hog all memory and evicts entries when it needs to. Ideally we want to peek inside the cache to make sure that entries were properly added, evicted and updated. Fortunately you can do all that with Spring:

1@RunWith(SpringJUnit4ClassRunner.class)
2@SpringApplicationConfiguration(classes = {Application.class})
3@WebIntegrationTest
4public class SharesIntegrationTest {
5    @Autowired
6    CacheManager cacheManager;
7 
8    @Before
9    public void setup() {
10        sharesCache = getAndInvalidate(CacheConfig.SHARES_CACHE);
11    }
12    private Cache getAndInvalidate(String name) {
13        //retrieve a reference to the underlying guava cache
14        Cache guavaCache = (Cache) cacheManager.getCache(name)
15                                               .getNativeCache();
16        //clear all entries
17        guavaCache.invalidateAll();
18        return guavaCache;
19    }
20}

This test suite fires up a Spring container for nl.jsprengers.caching.Application. The CacheManager is a bean like any other and can be injected in our unit test. We can retrieve the underlying Guava cache and access the values as a map:

1@Test
2    public void testShares() {
3        float value = sharesService.getValue(Shares.AKZO.name());
4        //the cache should contain a key for AKZO
5        assertThat(sharesCache.asMap()).containsKey("AKZO");
6        //this will cause the cache to be updated with a new price        
7        stockExchange.invalidateAllPrices();
8        float updatedValue = sharesService.getValue(Shares.AKZO.name());
9        assertThat(value).isNotEqualTo(updatedValue);        
10    }

Conclusions

Adding caching to your application can make dramatic improvements in terms of bandwidth, I/O or processor resources, but you must ask yourself two very important questions.

Is it acceptable to return stale cache entries?
What input can I expect? How frequent and with what range?

The answer to the first question probably lies outside the IT-department. For the second question a simple analysis of log data will go a long way. Caching is like most other frameworks and tools that promise to make our lives easier: give them a try, but if you don’t stand to gain from them, don’t bother.

Was this post helpful?

Blog author

Jasper Sprengers

Do you still have questions? Just send me a message.

fromJasper Sprengers

Elegant delegates in Kotlin

Kotlin has given us some really killer features . Some are obviously useful (null safety), while others come with a warning, like operator overloading and extension functions. One such ‘handle-with-care’ feature is the language support for delegation...

Kotlin

15.11.2017 | 5 minutes reading time

Jasper Sprengers

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

In my previous post I explained how software anti-patterns are symptoms of bad habits that can be endemic to entire teams. Today I want to talk about what is perhaps the most infamous of all: the Golden Hammer. Actually, it’s a collection of hammers...

9.10.2017 | 6 minutes reading time

Jasper Sprengers

When anti-patterns become a pattern

There are plenty of learning resources on software best practices. Sprinkled in between all the well-intended advice are warnings about common pitfalls. We could do with a lot more of these warnings and think about why we keep doing the same things wrong...

27.9.2017 | 6 minutes reading time

Jasper Sprengers

The most useless knowledge of all

There are things a programmer needs to know, no excuses. There are things you can’t possibly all remember, so it’s fine to look them up when needed. There is the business domain the software touches on that you need to know. And then there’s knowing...

10.9.2017 | 5 minutes reading time

Jasper Sprengers

Not everything that is vital is also your core business

Large software projects have many vital concerns, such as authentication and authorization. Despite the wealth of available libraries in the Java ecosystem we seem to be re-inventing the wheel far too often. Keep the focus on the core business of your...

Software architecture
Java
IT-Security

16.8.2017 | 5 minutes reading time

Jasper Sprengers

In defence of pedantic tools

Outline We aim to please the customer at short notice and always overestimate our capacity to comprehend a system as it gets more complex. That’s a recipe for technical debt. The antidote to this psychological shortfall is more team discipline in writing...

Agile methods
CI/CD

2.8.2017 | 7 minutes reading time

Jasper Sprengers

Mocks or the real thing? Tips for better unit testing

Recently I had to bone up on some of the new features in Mockito 2 and Powermock , though more out of necessity than from genuine curiosity. Powermock and Mockito 2 let you fake static methods, final classes and even constructor calls, but this has ...

Agile
Agile methods

16.7.2017 | 8 minutes reading time

Jasper Sprengers

Essentialism for developers

Essentialism – the Disciplined Pursuit of Less by Greg McKeown is a book with an essential message: much of life is irrelevant distraction and we would all be happier and more productive if we learned to strive for less, but better. I encourage you ...

2.7.2017 | 8 minutes reading time

Jasper Sprengers

The vicious circle of bad test code and how to break it

SUMMARY Compared to the great advances in programming languages and tools, the day-to-day practice of how we actually code is messier than it should be. Especially for long running and complex products building things right is just as important as building...

Agile methods
Agile
Testing

6.6.2017 | 9 minutes reading time

Jasper Sprengers

CRUD operations on Spring REST resources with Kotlin

In this practical, hands-on post I would like to share some of my experience in building REST services wih JSON and Spring(Boot) using Kotlin. All examples can be transferred to Java, and if you use the indispensable Lombok library it doesn’t even look...

Kotlin
API
Spring

2.4.2017 | 7 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices part 2

This is the second part of my earlier post about strategies for integration-testing Spring Boot applications that consist of multiple (rest) services. You can find the accompanying sample application in my gitlab account: git clone git@gitlab.com:jsprengers...

27.2.2017 | 8 minutes reading time

Jasper Sprengers

Integration testing strategies for Spring Boot microservices

SUMMARY: Unit tests are a necessary condition to clean code, but today’s convention-over-configuration frameworks like Spring Boot are often used to build applications consisting of multiple services. You need some way of ensuring that the parts are ...

Testing
Microservices

13.2.2017 | 9 minutes reading time

Jasper Sprengers

Web frameworks and how to survive them

SUMMARY: Frameworks that help build the web apps of tomorrow must keep up with all powerful new technology there is on offer. At some point your application has to adapt, and that is never a painless process. You can avoid a total rewrite however if ...

Angular
Java
JavaScript
Webdevelopment

12.1.2017 | 8 minutes reading time

Jasper Sprengers

Kotlin’s killer features

SUMMARY: Kotlin is a new JVM language fully interoperable with Java bytecode. It is clearly inspired by Scala, but has a different design philosophy, a much gentler learning curve and some really helpful features like null-safe types. The Importance ...

3.4.2016 | 10 minutes reading time

Jasper Sprengers

Sensible mutation testing: don’t go on a killing spree

This is a follow-up to my earlier post about mutation testing (MT). To recap: MT helps you ensure that your unit tests are any good. The framework manipulates your compiled code by inserting small changes (mutants). It then re-runs your tests and expects...

Testing

26.2.2016 | 6 minutes reading time

Jasper Sprengers

Mutation Testing: Watching the Watchmen

You can’t do without automated (unit) tests if you want to stay on top of the ever increasing complexity of software projects. A mutation testing framework ‘watches the watchmen’ by inserting small changes into your compiled byte code and then validating...

Testing

25.1.2016 | 7 minutes reading time

Jasper Sprengers

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Caching de luxe with Spring and Guava

Summary

Make a spoonful of broth. Fifteen times.

Crossing an A4 sheet at light speed

The use case from heaven

To cache or not to cache

A temperature cache

A postcode cache

A stock exchange cache

Can I please see some code???

The cached resources

Caching in action

But how do I test all this?

Conclusions

Was this post helpful?

Blog author

More articles

Elegant delegates in Kotlin

Anti-patterns part 2: Coding is the biggest Golden Hammer of all

When anti-patterns become a pattern

The most useless knowledge of all

Not everything that is vital is also your core business

In defence of pedantic tools

Mocks or the real thing? Tips for better unit testing

Essentialism for developers

The vicious circle of bad test code and how to break it

CRUD operations on Spring REST resources with Kotlin

Integration testing strategies for Spring Boot microservices part 2

Integration testing strategies for Spring Boot microservices

Web frameworks and how to survive them

Kotlin’s killer features

Sensible mutation testing: don’t go on a killing spree

Mutation Testing: Watching the Watchmen

Your job at codecentric?

Agile Developer und Consultant (w/d/m)