Useful JVM Flags – Part 6 (Throughput Collector)

4.1.2012 | 10 minutes reading time

For most application areas that we find in practice, a garbage collection (GC) algorithm is being evaluated according to two criteria:

The higher the achieved throughput, the better the algorithm.
The smaller the resulting pause times, the better the algorithm.

Let us first clarify the terms “throughput” and “pause times” in the context of GC. The JVM always executes a GC in dedicated threads, the “GC threads”. Whenever GC threads are active, they compete against the actual “application threads” for available processors and thus CPU time. Simplified just a little bit, we refer by “throughput” to the fraction of total program execution time at which the application threads are running. For example, a throughput of 99/100 means that on average the application threads are running 99 out of 100 seconds of program execution time, while the GC threads are only running for one second during the same time span.

The term “pause time” refers to a time span in which the application threads are paused completely in favor of the GC threads. For example, a pause time of 100 milliseconds during a GC means that no application thread was active during that 100 millisecond interval. If we then refer to an “average pause time” of 100 milliseconds for a running application, we state that all pause times observed for that application had an average length of 100 milliseconds. Similarly, a “maximum pause time” of 100 milliseconds means that none of the pause times observed were larger than 100 milliseconds.

Throughput vs. pause times

A high throughput is desirable because only the application threads perform “productive” work in the perception of the end user of the application. Intuitively, an application runs faster if the throughput is higher. Low pause times are desirable as well, because from the end user’s perspective a hanging application is always undesirable regardless of whether the stalls are being caused by GC or other reasons. Depending on the type of application, even short pauses of 200 milliseconds can disrupt end user experience. Thus, it is important to have a low maximum pause time, in particular for an interactive application.

Unfortunately, “high throughput” and “low pause times” are competing goals. Think about it this way, again a bit simplified for the sake of clarity: A GC requires certain preconditions in order to run safely. For example, it must be guaranteed that application threads do not modify the state of objects while the GC threads try to decide which objects are still referenced and which ones are not. For this reason, the application threads must be stopped during a GC (or, depending on the algorithm used, only during certain phases of a GC). This, however, causes additional costs for thread scheduling: direct costs through context switches and indirect costs because of cache effects. Together with the costs for additional JVM-internal safety measures, this means that each GC comes along with some non-negligible overhead, which adds up with the time taken by the GC threads to perform their actual work. Therefore, we can maximize throughput by running the GC as seldom as possible, i.e., only when it is unavoidable, to save all the overhead associated with it.

However, executing the GC only infrequently means that whenever a GC is run it has much more work to do, as the number of objects that accumulated on the heap in the meantime is much higher. A single GC takes more time until completion, which in turn causes higher average and maximum pause times. Thus, with low pause times in mind, it would be desirable to run the GC more frequently so that each single run completes more quickly. This in turn adds overhead and causes throughput to decline, and we are back where we started.

In summary, when designing (or using!) a GC algorithm we have to decide what we are aiming for: A GC algorithm may target one of the two goals only (i.e., solely focus on maximum throughput or on minimal pause times) or try to find a compromise between them.

Garbage collection on the HotSpot JVM

We already discussed the young generation GC in part 5 of this series. For the old generation, the HotSpot JVM mainly offers two classes of GC algorithms (leaving aside the new G1 GC algorithm for now). The first class contains algorithms that try to maximize throughput while the second class tries to minimize pause times. Today we focus on the first, “throughput-oriented” class.

As we want to focus on configuration flags, I will only give a brief overview of the throughput-oriented GC algorithms offered by HotSpot. The GC algorithm is triggered when an object allocation in the old generation fails due to lack of space (usually, the “allocation” is in fact a promotion of an object from the young generation). Starting at so-called “GC roots”, the GC then searches the heap for reachable objects and marks them as alive. Afterwards, the GC moves the live objects within the old generation so that they occupy a single, non-fragmented memory block, and takes a note that the remaining memory area is free. That is, we don’t follow a copying strategy into a different heap area, like the young generation GC algorithm does. Instead, we keep all objects in the same heap area, thereby defragmenting that area. The collectors use one or more threads to perform GC. When more than one thread is used, the different steps of the algorithm are subdivided such that each GC thread mostly works in its own area without interfering with others. During a GC, all application threads are paused, getting restarted only when the GC is finished. Now let us take a look at the most important flags regarding the throughput-oriented GC algorithms.

-XX:+UseSerialGC

We use this flag to activate the serial, i.e., single-threaded version of the throughput-oriented garbage collector. Both the young generation and the old generation GC will be executed by a single GC thread only. This flag is to be recommended for JVMs that only have a single processor core available. In such a situation, using multiple GC threads would even be counterproductive because these threads would compete for CPU resources and cause synchronization overhead but never actually run in parallel.

-XX:+UseParallelGC

With this flag, we tell the JVM to execute the young generation GC in parallel using multiple GC threads. With Java 6, in my opinion, this flag should not be used because -XX:+UseParallelOldGC is clearly preferable. Note that with Java 7 the situation has changed a bit (see, e.g., this overview ) so that -XX:+UseParallelGC may be used to the same effect as -XX:+UseParallelOldGC.

-XX:+UseParallelOldGC

The naming of this flag is a little unfortunate because “old” sounds like “deprecated”. However, “old” actually refers to the old generation, which explains why -XX:+UseParallelOldGC is preferable over -XX:+UseParallelGC: in addition to a parallel young generation GC, it also activates parallel old generation GC. I recommend using this flag whenever high throughput it desired and the JVM has two or more processor cores available.

As a sidenote, the parallel versions of the throughput-oriented HotSpot GC algorithms are often called “throughput collectors” (or also: “the Throughput Collector”) because they aim at increased throughput through parallel execution.

-XX:ParallelGCThreads

With -XX:ParallelGCThreads= we can specify the number of GC threads to use for parallel GC. For example, with -XX:ParallelGCThreads=6 each parallel GC will be executed with six threads. If we don’t explicitly set this flag, the JVM will use a default value which is computed based on the number of available (virtual) processors. The determining factor is the value N returned by the Java method Runtime.availableProcessors(). For N <= 8 parallel GC will use just as many, i.e., N GC threads. For N > 8 available processors, the number of GC threads will be computed as 3+5N/8.

Using the default setting makes most sense when the JVM uses the system and its processors exclusively. However, if more than one JVM (or other CPU-hungry systems) are all running on the same machine, we should use -XX:ParallelGCThreads in order to reduce the number of GC threads to an adequate value. For example, if four server JVMs are running on a machine with 16 processor cores, then -XX:ParallelGCThreads=4 is a sensible choice so that GCs of different JVMs don’t interfere with each other.

-XX:-UseAdaptiveSizePolicy

The throughput collectors offer an interesting (but common, at least on modern JVMs) mechanism to improve user-friendliness of GC configuration. This mechanism is part of what is known as “ergonomics”, a concept introduced for HotSpot with Java 5. With ergonomics, the garbage collector may dynamically apply modifications to the different heap areas sizes as well as the GC settings if it has evidence that these modifications would improve GC performance. The precise meaning of “improve GC performance” may be specified by the user via the flags -XX:GCTimeRatio and -XX:MaxGCPauseMillis (see below).

It is important to know that ergonomics is activated by default – which is fine, as adaptive behavior is one of the biggest strengths of the JVM. Still, sometimes we may have a pretty clear idea of what settings are best for a particular application, and in these cases we might not want the JVM to mess around with our settings. Whenever we find ourselves in such a situation, we may consider deactivating some of the ergonomics by setting -XX:-UseAdaptiveSizePolicy.

-XX:GCTimeRatio

With -XX:GCTimeRatio= we tell the JVM a target value for the throughput to achieve. More precisely, a value of -XX:GCTimeRatio=N specifies a target fraction of N/(N+1) for the execution time of the application threads (related to the total program execution time). For example, with -XX:GCTimeRatio=9 we demand that the application threads should be active for at least 9/10 of the total execution time (and, accordingly, the GC threads for the remaining 1/10). Based on measurements at run time, the JVM will then try to modify the heap and GC configuration such that the the target throughput is reached. The default value for -XX:GCTimeRatio is 99, i.e., the application threads should run for at least 99 percent of the total execution time.

-XX:MaxGCPauseMillis

The flag -XX:MaxGCPauseMillis= tells the JVM a target value (in milliseconds) for the maximum pause time. At run time, the throughput collector then computes statistics (a weighted average and the standard deviation) over the pause times it observes. If the statistics suggest that there is a risk of experiencing pause times exceeding the target value, the JVM modifies heap and GC settings in order to reduce them. Note that the statistics are computed separately for the young and old generation GCs. Also note that, by default, there is no target value set for the maximum pause time.

If both target values for the maximum pause time and the minimum throughput are set, achieving the maximum pause time goal has higher priority. Of course, there is no guarantee at all that the JVM will achieve either goal, even though it will try hard to do so. In the end, everything depends on the behavior of the application at hand.

When setting a maximum pause time target, we should take care not to choose too small a value. As we know by now, in order to keep pause times low, the JVM has to increase the total number of GCs which may seriously impact the achievable throughput. That’s why for applications that require low pause times as their primary goal (which is the case for most web applications), I would recommend to not use the throughput collector at all, but instead switch to the CMS collector. The CMS collector will be the topic of the next part of this series.

Was this post helpful?

Blog author

Patrick Peschlow

Do you still have questions? Just send me a message.

fromPatrick Peschlow

Scaling an Elasticsearch Index – Introduction

A well-known design decision of Elasticsearch is that a fixed number of shards has to be specified when creating an index. It is not possible to start out with just one or only a few shards and add more shards later as the data increases. Now what to...

30.3.2015 | 7 minutes reading time

Patrick Peschlow

Transactions in Elasticsearch

Earlier this year a customer mentioned a search requirement that I hadn’t really thought about before: How to achieve transactions in Elasticsearch? Recently, the same requirement popped up again in a conversation I had with other search aficionados....

6.10.2014 | 8 minutes reading time

Patrick Peschlow

Elasticsearch Indexing Performance Cheatsheet

You plan to index large amounts of data in Elasticsearch? Or you are already trying to do so but it turns out that throughput is too low? Here is a collection of tips and ideas to increase indexing throughput with Elasticsearch. Some of them I have successfully...

NoSQL

8.5.2014 | 8 minutes reading time

Patrick Peschlow

Elasticsearch Monitoring and Management Plugins

Elasticsearch offers a highly useful plugin mechanism as a standard way for extending its core. Plugins enable developers to add new functionality, e.g., a custom analyzer, or provide alternatives to existing functionality, like swapping in another transport...

30.3.2014 | 11 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 8 (GC Logging)

The last part of this series is about garbage collection logging and associated flags. The GC log is a highly important tool for revealing potential improvements to the heap and GC configuration or the object allocation pattern of the application. For...

3.1.2014 | 8 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 7 (CMS Collector)

The Concurrent Mark Sweep Collector (“CMS Collector”) of the HotSpot JVM has one primary goal: low application pause times. This goal is important for most interactive applications like web applications. Before we take a look at the relevant JVM flags...

4.3.2013 | 10 minutes reading time

Patrick Peschlow

ForkJoinPool vs. ThreadPoolExecutor

Recently, an article of mine appeared on the German site Heise Developer, and today the English translation was published on The H Developer. The article gives an introduction to the Java 7 ForkJoinPool and explains for which application scenarios ...

25.11.2012 | 1 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 5 (Young Generation Garbage Collection)

In this part of our series we focus on one of the major areas of the heap, the “young generation”. First of all, we discuss why an adequate configuration of the young generation is so important for the performance of our applications. Then we move on...

18.8.2011 | 13 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 4 (Heap Tuning)

Ideally, a Java application runs just fine with the default JVM settings so that there is no need to set any flags at all. However, in case of performance problems (which unfortunately arise quite often) some knowledge about relevant JVM flags is a welcome...

2.7.2011 | 6 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 3 (Printing all XX Flags and their Values)

With a recent update of Java 6 (must have been update 20 oder 21), the HotSpot JVM offers two new command line flags which print a table of all XX flags and their values to the command line right after JVM startup. As many HotSpot users were longing ...

Java
APM

10.4.2011 | 4 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 2 (Flag Categories and JIT Compiler Diagnostics...

In the second part of this series, I give an introduction to the different categories of flags offered by the HotSpot JVM. Also, I am going to discuss some interesting flags regarding JIT compiler diagnostics. JVM flag categories The HotSpot JVM offers...

Java
APM

23.3.2011 | 9 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 1 (JVM Types and Compiler Modes)

Modern JVMs do an amazing job at running Java applications (and those of other compatible languages) in an efficient and stable manner. Adaptive memory management, garbage collection, just-in-time compilation, dynamic classloading, lock optimization ...

Java
APM

8.3.2011 | 6 minutes reading time

Patrick Peschlow

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Useful JVM Flags – Part 6 (Throughput Collector)

Was this post helpful?

Blog author

More articles

Scaling an Elasticsearch Index – Introduction

Transactions in Elasticsearch

Elasticsearch Indexing Performance Cheatsheet

Elasticsearch Monitoring and Management Plugins

Useful JVM Flags – Part 8 (GC Logging)

Useful JVM Flags – Part 7 (CMS Collector)

ForkJoinPool vs. ThreadPoolExecutor

Useful JVM Flags – Part 5 (Young Generation Garbage Collection)

Useful JVM Flags – Part 4 (Heap Tuning)

Useful JVM Flags – Part 3 (Printing all XX Flags and their Values)

Useful JVM Flags – Part 2 (Flag Categories and JIT Compiler Diagnostics...

Useful JVM Flags – Part 1 (JVM Types and Compiler Modes)

Your job at codecentric?

Agile Developer und Consultant (w/d/m)