GemFire: Long GC's and JVM Pauses due to insufficient Survivor space
search cancel

GemFire: Long GC's and JVM Pauses due to insufficient Survivor space

book

Article ID: 294297

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

The default value for SurvivorRatio is 8 for most JVM's. This means that each Survivor space gets about 10% of the total young generation, while Eden space is given 80% of the total NewSize.

Unfortunately by default, at least for GemFire systems, this proves to be a completely insufficient amount of Survivor space, relative to Eden, to avoid long GC's and JVM pauses.

Below are a couple of charts illustrating how Survivor space, even when sized bigger, can still be insufficient when loading gigabytes of data in the GemFire cache.




You can see the Survivor space getting overwhelmed during the startup phase of a GemFire cache server given all of the data getting loaded. In the 2nd chart, you can see how this can negatively impact JVM Pauses, potentially causing startup issues.

Environment

OS: Linux

Resolution

Stated simply, this default value for the SurvivorRatio=8 makes the Survivor space too small and incapable of handling the data at the rate of ingestion into the cache.  

Tuning many customers in recent years, especially those with heaps greater than Xmx=5g in size (which really isn't that big), it has been found that by overriding this default value and setting the SurvivorRatio to 1 or 2, much lower values, makes Survivor space much bigger and more capable of handling the data being promoted from Eden space.

For example, if you have a NewSize=3g, by default with SurvivorRatio=8, your Eden space would be 2.4g while each of the two Survivor spaces would be 300m each.  

Instead, with an adjustment of the SurvivorRatio to 1, the Eden space would be 1g and each of the two Survivor spaces would be 1g also - much more capable to handle data getting promoted to Survivor and living in Survivor as well for long lived GemFire data in the cache. Finally, this data will get promoted to tenured space.

Some caution must be taken before altering your SurvivorRatio. It is important to consider the new Eden size, making sure not to make it so small that it is not capable of handling the load of new object allocations. It is often necessary to increase NewSize=MaxNewSize as well, such that Eden space itself doesn't shrink too much. 

When altering the SurvivorRatio, another flag is important to consider as well. The TargetSurvivorRatio, which defaults to 50, is a value that determines how much of Survivor space can get filled before the GC algorithm decides to prematurely promote that data, in an effort to make sure Survivor space is not overwhelmed. This 50% default is generally excessive, essentially wasting half of Survivor space to protect you from long GC's, but we have found that a setting of 60% or 70% proves to be sufficient overhead to protect you from long GC's. Do not go above the 70% setting without very thorough testing of your use case against full production load.

Finally, to be able to fine tune your heap properly, especially when playing with SurvivorRatio and TargetSurvivorRatio, you need a flag called the PrintTenuringDistribution flag. There is no real cost to having this print flag but with it we can then observe the behavior of your data to finer tune if necessary.