Useful JVM Flags – Part 7 (CMS Collector)

4.3.2013 | 10 minutes reading time

The Concurrent Mark Sweep Collector (“CMS Collector”) of the HotSpot JVM has one primary goal: low application pause times. This goal is important for most interactive applications like web applications. Before we take a look at the relevant JVM flags, let us briefly recap the operation of the CMS Collector and the main challenges that may arise when using it.

Just like the Throughput Collector (see part 6 of the series), the CMS Collector handles objects in the old generation, yet its operation is much more complex. The Throughput Collector always pauses the application threads, possibly for a considerable amount of time, which however enables its algorithms to safely ignore the application. In contrast to that, the CMS Collector is designed to run mostly concurrent to the application threads and only cause few (and short) pause times. The downside of running GC concurrently to the application is that various synchronization and data inconsistency issues may arise. In order to achieve safe and correct concurrent execution, a GC cycle of the CMS Collector is divided into a number of consecutive phases.

Phases of the CMS Collector

A GC cycle of the CMS Collector consists of six phases. Four of the phases (the names of which start with „Concurrent“) are run concurrently to the actual application while the other two phases need to stop the application threads.

Initial Mark: The application threads are paused in order to collect their object references. When this is finished, the application threads are started again.
Concurrent Mark: Starting from the object references collected in phase 1, all other referenced objects are traversed.
Concurrent Preclean: Changes to object references made by the application threads while phase 2 was running are used to update the results from phase 2.
Remark: As phase 3 is concurrent as well, further changes to object references may have happened. Therefore, the application threads are stopped once more to take any such updates into account and ensure a correct view of referenced objects before the actual cleaning takes place. This step is essential because it must be avoided to collect any objects that are still referenced.
Concurrent Sweep: All objects that are not referenced anymore get removed from the heap.
Concurrent Reset: The collector does some housekeeping work so that there is a clean state when the next GC cycle starts.

A common misconception is that the CMS collector runs fully concurrent to the application. We have seen that this is not the case, even if the stop-the-world phases are usually very short when compared to the concurrent phases.

It should be noted that, even though the CMS Collector offers a mostly concurrent solution for old generation GCs, young generation GCs are still handled using a stop-the-world approach. The rationale behind this is that young generation GCs are typically short enough so that the resulting pause times are satisfactory even for interactive applications.

Challenges
When using the CMS Collector in real-world applications, we face two major challenges that may create a need for tuning:

Heap fragmentation
High object allocation rate

Heap fragmentation is possible because, unlike the Throughput Collector, the CMS Collector does not contain any mechanism for defragmentation. As a consequence, an application may find itself in a situation where an object cannot be allocated even though the total heap space is far from exhausted – simply because there is no consecutive memory area available to fully accommodate the object. When this happens, the concurrent algorithms do not help anymore and thus, as a last resort, the JVM triggers a full GC. Recall that a full GC runs the algorithm used by the Throughput Collector and thus resolves the fragmentation issues – but it also stops the application threads. Thus, despite all the concurrency that the CMS Collector brings, there is still a risk of a long stop-the-world pause happening. This is “by design” and cannot be switched off – we can only reduce its likelihood by tuning the collector. Which is problematic for interactive applications that would like to have a 100% guarantee of being safe from any noticeable stop-the-world pauses.

The second challenge is high object allocation rate of the application. If the rate at which objects get instantiated is higher than the rate at which the collector removes dead objects from the heap, the concurrent algorithm fails once again. At some point, the old generation will not have enough space available to accommodate an object that is to be promoted from the young generation. This situation is referred to as “concurrent mode failure”, and the JVM reacts just like in the heap fragmentation scenario: It triggers a full GC.

When one of these scenarios manifests itself in practice (which, as is so often the case, usually happens on a production system), it often turns out that there is an unnecessary large amount of objects in the old generation. One possible countermeasure is to increase young generation size, in order to prevent premature promotions of short-lived objects into the old generation. Another approach is to use a profiler, or take heap dumps of the running system, to analyze the application for excessive object allocation, identify these objects, and eventually reduce the amount of objects allocated.

In the following we will take a look at the most relevant JVM flags available for tuning the CMS Collector.

-XX:+UseConcMarkSweepGC

This flag is needed to activate the CMS Collector in the first place. By default, HotSpot uses the Throughput Collector instead.

-XX:+UseParNewGC

When the CMS collector is used, this flag activates the parallel execution of young generation GCs using multiple threads. It may seem surprising at first that we cannot simply reuse the flag -XX:+UseParallelGC known from the Throughput Collector, because conceptually the young generation GC algorithms used are the same. However, since the interplay between the young generation GC algorithm and the old generation GC algorithm is different with the CMS collector, there are two different implementations of young generation GC and thus two different flags.

Note that with recent JVM versions -XX:+UseParNewGC is enabled automatically when -XX:+UseConcMarkSweepGC is set. As a consequence, if parallel young generation GC is not desired, it needs to be disabled by setting -XX:-UseParNewGC.

-XX:+CMSConcurrentMTEnabled

When this flag is set, the concurrent CMS phases are run with multiple threads (and thus, multiple GC threads work in parallel with all the application threads). This flag is already activated by default. If serial execution is preferred, which may make sense depending on the hardware used, multithreaded execution can be deactivated via -XX:-CMSConcurrentMTEnabled.

-XX:ConcGCThreads

The flag -XX:ConcGCThreads= (in earlier JVM versions also known as -XX:ParallelCMSThreads) defines the number of threads with which the concurrent CMS phases are run. For example, value=4 means that all concurrent phases of a CMS cycle are run using 4 threads. Even though a higher number of threads may well speed up the concurrent CMS phases, it also causes additional synchronization overhead. Thus, for a particular application at hand, it should be measured if increasing the number of CMS threads really brings an improvement or not.

If this flag is not explicitly set, the JVM computes a default number of parallel CMS threads which depends on the value of the flag -XX: ParallelGCThreads known from the Throughput Collector. The formula used is ConcGCThreads = (ParallelGCThreads + 3)/4. Thus, with the CMS Collector, the flag -XX:ParallelGCThreads does not only affect stop-the-world GC phases but also the concurrent phases.

In summary, there are quite a few ways of configuring multithreaded execution of the CMS collector. Precisely for this reason, it is recommended to first run the CMS Collector with its default settings and then measure if there is a need for tuning at all. Only if measurements in a production system (or a production-like test system) show that the pause time goals of the application are not reached, GC tuning via these flags should be considered.

-XX:CMSInitiatingOccupancyFraction

The Throughput Collector starts a GC cycle only when the heap is full, i.e., when there is not enough space available to store a newly allocated or promoted object. With the CMS Collector, it is not advisable to wait this long because it the application keeps on running (and allocating objects) during concurrent GC. Thus, in order to finish a GC cycle before the application runs out of memory, the CMS Collector needs to start a GC cycle much earlier than the Throughput Collector.

As different applications have different object allocation patterns, the JVM collects run time statistics about the actual object allocations (and deallocations) it observes and uses them to determine when to start a CMS GC cycle. To bootstrap this process, the JVM takes a hint when to start the very first CMS run. The hint may be set via -XX:CMSInitiatingOccupancyFraction= where value denotes the utilization of old generation heap space in percent. For example, value=75 means that the first CMS cycle starts when 75% of the old generation is occupied. Traditionally, the default value of CMSInitiatingOccupancyFraction is 68 (which was determined empirically quite some time ago).

-XX+UseCMSInitiatingOccupancyOnly

We can use the flag -XX+UseCMSInitiatingOccupancyOnly to instruct the JVM not to base its decision when to start a CMS cycle on run time statistics. Instead, when this flag is enabled, the JVM uses the value of CMSInitiatingOccupancyFraction for every CMS cycle, not just for the first one. However, keep in mind that in the majority of cases the JVM does a better job of making GC decisions than us humans. Therefore, we should use this flag only if we have good reason (i.e., measurements) as well as really good knowledge of the lifecycle of objects generated by the application.

-XX:+CMSClassUnloadingEnabled

In contrast to the Throughput Collector, the CMS Collector does not perform GC in the permanent generation by default. If permanent generation GC is desired, it can be enabled via -XX:+CMSClassUnloadingEnabled. In earlier JVM versions, it may be required to additionally set the flag -XX:+CMSPermGenSweepingEnabled. Note that, even if this flag is not set, there will be an attempt to garbage-collect permanent generation once it runs out of space, but the collection will not be concurrent – instead, once again, a full GC will be run.

-XX:+CMSIncrementalMode

This flag activates the incremental mode of the CMS Collector. Incremental mode pauses the concurrent CMS phases regularly, so as to yield completely to the application threads. As a consequence, the collector will take longer to complete a whole CMS cycle. Therefore, using incremental mode only makes sense if it has been measured that the threads running a normal CMS cycle interfere too much with the application threads. This happens rather rarely on modern server hardware which usually has enough processors available to accommodate concurrent GC.

-XX:+ExplicitGCInvokesConcurrent and -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses

Nowadays, the widely accepted best practice is to refrain from explicitly invoking GC (a so-called “system GC”) by calling System.gc() in the application. While this advice holds regardless of the GC algorithm used, it is worth mentioning that a system GC is an especially unfortunate event when the CMS Collector is used, because it triggers a full GC by default. Luckily, there is a way to change the default. The flag -XX:+ExplicitGCInvokesConcurrent instructs the JVM to run a CMS GC instead of a full GC whenever system GC is requested. There is a second flag, -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses, which additionally ensures that the permanent generation is included into the CMS GC in case of a system GC request. Thus, by using these flags, we can safeguard ourselves against unexpected stop-the-world system GCs.

-XX:+DisableExplicitGC

And while we are on the subject… this is a good opportunity to mention the flag -XX:+DisableExplicitGC which tells the JVM to completely ignore system GC requests (regardless of the type of collector used). For me, this flag belongs to a set of “default” flags that can be safely specified on every JVM run without further thinking.

Was this post helpful?

Blog author

Patrick Peschlow

Do you still have questions? Just send me a message.

fromPatrick Peschlow

Scaling an Elasticsearch Index – Introduction

A well-known design decision of Elasticsearch is that a fixed number of shards has to be specified when creating an index. It is not possible to start out with just one or only a few shards and add more shards later as the data increases. Now what to...

30.3.2015 | 7 minutes reading time

Patrick Peschlow

Transactions in Elasticsearch

Earlier this year a customer mentioned a search requirement that I hadn’t really thought about before: How to achieve transactions in Elasticsearch? Recently, the same requirement popped up again in a conversation I had with other search aficionados....

6.10.2014 | 8 minutes reading time

Patrick Peschlow

Elasticsearch Indexing Performance Cheatsheet

You plan to index large amounts of data in Elasticsearch? Or you are already trying to do so but it turns out that throughput is too low? Here is a collection of tips and ideas to increase indexing throughput with Elasticsearch. Some of them I have successfully...

NoSQL

8.5.2014 | 8 minutes reading time

Patrick Peschlow

Elasticsearch Monitoring and Management Plugins

Elasticsearch offers a highly useful plugin mechanism as a standard way for extending its core. Plugins enable developers to add new functionality, e.g., a custom analyzer, or provide alternatives to existing functionality, like swapping in another transport...

30.3.2014 | 11 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 8 (GC Logging)

The last part of this series is about garbage collection logging and associated flags. The GC log is a highly important tool for revealing potential improvements to the heap and GC configuration or the object allocation pattern of the application. For...

3.1.2014 | 8 minutes reading time

Patrick Peschlow

ForkJoinPool vs. ThreadPoolExecutor

Recently, an article of mine appeared on the German site Heise Developer, and today the English translation was published on The H Developer. The article gives an introduction to the Java 7 ForkJoinPool and explains for which application scenarios ...

25.11.2012 | 1 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 6 (Throughput Collector)

For most application areas that we find in practice, a garbage collection (GC) algorithm is being evaluated according to two criteria: The higher the achieved throughput, the better the algorithm.The smaller the resulting pause times, the better the ...

4.1.2012 | 10 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 5 (Young Generation Garbage Collection)

In this part of our series we focus on one of the major areas of the heap, the “young generation”. First of all, we discuss why an adequate configuration of the young generation is so important for the performance of our applications. Then we move on...

18.8.2011 | 13 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 4 (Heap Tuning)

Ideally, a Java application runs just fine with the default JVM settings so that there is no need to set any flags at all. However, in case of performance problems (which unfortunately arise quite often) some knowledge about relevant JVM flags is a welcome...

2.7.2011 | 6 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 3 (Printing all XX Flags and their Values)

With a recent update of Java 6 (must have been update 20 oder 21), the HotSpot JVM offers two new command line flags which print a table of all XX flags and their values to the command line right after JVM startup. As many HotSpot users were longing ...

Java
APM

10.4.2011 | 4 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 2 (Flag Categories and JIT Compiler Diagnostics...

In the second part of this series, I give an introduction to the different categories of flags offered by the HotSpot JVM. Also, I am going to discuss some interesting flags regarding JIT compiler diagnostics. JVM flag categories The HotSpot JVM offers...

Java
APM

23.3.2011 | 9 minutes reading time

Patrick Peschlow

Useful JVM Flags – Part 1 (JVM Types and Compiler Modes)

Modern JVMs do an amazing job at running Java applications (and those of other compatible languages) in an efficient and stable manner. Adaptive memory management, garbage collection, just-in-time compilation, dynamic classloading, lock optimization ...

Java
APM

8.3.2011 | 6 minutes reading time

Patrick Peschlow

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Useful JVM Flags – Part 7 (CMS Collector)

Was this post helpful?

Blog author

More articles

Scaling an Elasticsearch Index – Introduction

Transactions in Elasticsearch

Elasticsearch Indexing Performance Cheatsheet

Elasticsearch Monitoring and Management Plugins

Useful JVM Flags – Part 8 (GC Logging)

ForkJoinPool vs. ThreadPoolExecutor

Useful JVM Flags – Part 6 (Throughput Collector)

Useful JVM Flags – Part 5 (Young Generation Garbage Collection)

Useful JVM Flags – Part 4 (Heap Tuning)

Useful JVM Flags – Part 3 (Printing all XX Flags and their Values)

Useful JVM Flags – Part 2 (Flag Categories and JIT Compiler Diagnostics...

Useful JVM Flags – Part 1 (JVM Types and Compiler Modes)

Your job at codecentric?

Agile Developer und Consultant (w/d/m)