The purpose of this article is to assist customers in configuring G1GC the first time, or after experiencing issues, make customers aware of some of the behaviors of G1GC.
It is certainly worth taking time to discuss what algorithm would be best to use in your environment. While G1GC is the current default, newer algorithms are worth consideration. Shenandoah has been found to have very nice improvements over G1. Shenandoah has been back ported to the latest Java 8 builds given the success being realized. ZGC is another algorithm. Many online videos exist describing the better behavior of these algorithms compared to G1GC.
Please take the time to explore these options with your in-house JVM and Java GC experts. While GC tuning is certainly an ongoing, living configuration that must always be evaluated, with increased loads, etc., changing algorithms completely can be an even bigger decision.
G1GC is currently the default garbage collection algorithm used in current versions of Java. While some claim that G1GC is simple and can be incorporated effectively right out of the box with minimal configuration, it has been our experience that G1GC requires tuning, just like other algorithms.
This articles discusses the various flags needed to fine tune G1GC and how to give your system the best chance to stay stable while using G1.
Most customers have been using CMS for tenured heap garbage collection for some time now. Now that CMS is deprecated and removed completely as of Java 14, users are starting to transition to other collectors. G1GC, being the current default, is the primary option.
Many parameters require sufficient understanding to help you fine tune GC in your environments. Many of these are described in this article along with with some details providing the reasons behind the recommendations.
Symptoms:
There are many symptoms that can be caused by an insufficient heap/GC configuration. In this section, we will itemize various G1GC flags and share the various issues you may experience if not tuned properly, or if using insufficient resources.
1. MaxGCPauseMillis: So many decisions are driven from this setting. Customers are making the mistake of setting this value much too low, especially when using larger heaps. This needs to be set higher, to 1000 or 2000 minimally, then observe the behavior. When using the default values or values that are simply too low, the behavior can deteriorate due to the fact that the tenured heap just doesn't get collected sufficiently.
2. InitiatingHeapOccupancyPercent (IHOP): Please do not set this while thinking that it is a simple replacement for the CMSInitiatingOccupancyFractions of CMS. The default is 45 and this is too high. This tells the system to only start thinking about even collecting garbage in tenured heap when the tenured only space consumed exceeds 45%. In that regard, it is similar to CMS, but G1 does not collect garbage "aggressively" like CMS. When you pass the CMS OFraction, all garbage in the tenured heap is collected in the very next collection. That does not happen with G1. In general, G1GC does not aggressively collect tenured space. Various criteria must be met as a whole, and on a per G1 region basis, to drive G1 to collect tenured space. Then, when those criteria are met, it still only collects heap in smaller portions, more slowly, driven by arguments like the G1OldCSetRegionThresholdPercent. Due to all of these arguments and the way they interact, our recommendation is to start doing "(mixed)" collections much earlier than the recommended default. We have found 25-35% be a good tuning option, such that tenured heap consumption is kept in check earlier, so that garbage doesn't start to consume tenured heap more than necessary.
3. G1NewSizePercent: The default value is 5%. From our experience over the last few years, especially with larger heaps, this value has been shown to be simply much too low. Allowing the GC algorithm to take the Eden space down to only 5% of the heap drives much more frequent collections, often at very bad times. It seems that when tenured heap is growing, and with too low a MaxGCPauseMillis, the combination drives G1 to reduce Eden space. If this is occurring during a burst of activity, you actually want Eden space to remain large enough to handle that burst. Instead, G1 is allowing the tenured heap to grow, often with much garbage, and instead of effectively collecting that garbage, it shrinks the Eden space. This drives Survivor to become overwhelmed as well, and actually more promotions into tenured heap, the exact opposite of what is needed. The recommendation is to keep this high enough based on your use case. If you know you have more needs for young generation versus tenured heap, keep it higher. The suggestion is to set this to 15-25% minimum, and even higher during fine tuning.
4. G1MaxNewSizePercent: Similarly, the G1 default for how large the Eden space can be is 60%. It is possible that this is simply too high and that you find yourself being short of other spaces if G1 has taken your Eden up to 60%. The combination of these 2 settings seems best when something like a range of 15-40, or 25-50, depending again on your load, data behavior, use of queries, etc.
5. G1HeapWastePercent: This basically is the system saying that it is willing to have quite a bit of garbage accumulating in tenured G1 regions, by default. For a 100g heap, it can be extreme amounts of garbage. By lowering this amount, you are increasing the cost of (mixed) collections, but this is manageable, compared to the extreme issues you may experience due to heap exhaustion, etc. The recommendation is setting this more in the 3% range and then fine tune based upon your observed behavior of tenured heap.
6. G1MixedGCLiveThresholdPercent: This is another setting that prevents timely collection of garbage in tenured space. While the above G1HeapWastePercent is a setting covering the whole of all space, this setting deals with each G1 region individually. For large heaps, your G1 region size is likely 32mb. Some are still using JDK versions where the default for this is 65%. The newer default has been changed to 85%, due to issues we've been seeing as well. Specifically, if you have 2000+ 32mb G1 regions, for example, and this is set too low, like the 65% mark, that could be 20gb or more of garbage, existing in these regions, and yet still not deemed "worthy" of collection. This is of course an extreme case, but that's the behavior if the spread of data across regions deteriorates toward worst case behavior. The recommendation is to set this to 85% minimally, and perhaps even better 90-95%. You have to find a combination of this setting, with a high enough MaxGCPauseMillis, which likely would need to be higher in the 2000-4000 range, for large heaps, to increase this to 90+.