GemFire- Heap LRU Eviction With The G1 Garbage Collector
search cancel

GemFire- Heap LRU Eviction With The G1 Garbage Collector

book

Article ID: 388242

calendar_today

Updated On:

Products

Gemfire VMware Tanzu Gemfire Pivotal GemFire VMware GemFire Enterprise Edition

Issue/Introduction

GemFire’s heap LRU eviction policy, unlike the other two eviction policies: entry count, and absolute memory usage, sets up a feedback loop between GemFire’s resource manager, and the Java garbage collector. As a result of that feedback loop, the Java Garbage Collector must be configured for compatibility. That configuration is different for each GC algorithm: CMS, ZGC, G1GC. 

The G1GC algorithm presents particular challenges when used with GemFire heap LRU eviction. While configuring G1GC to work with heap LRU eviction is challenging, there can be certain benefits of choosing that combination. One important benefit is that G1GC can be more space-efficient than ZGC for smaller heap sizes.

If you are running GemFire on JDK 17 or higher, and your heap is smaller than 32GB the default GemFire configuration might not be ideal.  On JDK version 17, the gfsh start command will select the Z garbage collector. From Memory Requirements For Cached Data:

For heap sizes smaller than 32 GB, 64-bit JVMs using the CMS or G1 garbage collector can benefit from Java’s -XX:+UseCompressedOops option, which significantly reduces Java’s heap usage. Java enables this option by default on JVMs that can benefit from it. JVMs using ZGC and JVMs with heap sizes of 32 GB or larger cannot use this option.

So, depending on your application’s requirements, switching to G1, may meet your needs with improved storage efficiency. However, if you choose to use the G1 garbage collector and you use GemFire’s heap LRU eviction feature, you need to be aware of additional tradeoffs, testing, and tuning that will be required.

This article explains how to configure GemFire and the JDK to support heap LRU eviction on heaps smaller than 32GB using G1GC on JDK 17 and up.

Environment

  • GemFire version 10.0 and later
  • JDK 17 and up. Java heap smaller than 32GB.

Resolution

If you need more efficient storage in a small heap on JDK 17, you can disable the Z garbage collector via the -XX:-UseZGC java command line option. By disabling the Z garbage collector, the JDK 17 default garbage collector, G1, will be in effect. If you want to really be confident that you’re using the G1 garbage collector you can specify: -XX:-UseZGC -XX:+UseG1GC.

If you decide to use the G1 garbage collector then additional configuration work will be needed. You need to:

  • Address Humongous Objects
  • Force G1GC to Perform Frequent Garbage Collection

Address Humongous Objects

The G1 garbage collector divides the heap into regions of a fixed size. Objects larger than one-half the region size are deemed “humongous” and are stored in special regions called humongous regions. Humongous objects incur both space and CPU penalties, so they should be avoided where possible.

Both GemFire and Java’s G1 garbage collector use the term “region”. We’re going to talk about both, so be alert to which kind of region we are talking about in the remainder of this article.

GemFire’s fundamental data structure is also called a region. A GemFire region is a hash map. That map contains entries. Most of the Java heap storage consumed by a GemFire application is attributable to these map entries. Any region entry that is larger than one half the G1(garbage collector) region size will incur performance penalties associated with humongous objects. If you are using the G1 garbage collector it is important to know the maximum size (heap utilization) of your region entries. Once you know that size, you can configure G1 to treat your entries as regular (not humongous) objects. To calculate the size of your entries, refer to Memory Requirements for Cached Data in the GemFire documentation.

Once the maximum entry size is known, compare it to the G1 heap region size. From Oracle’s Garbage First Garbage Collector Tuning

-XX:G1HeapRegionSize=n

Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.

If no -XX:G1HeapRegionSize=n option is specified, then the JDK will choose a G1HeapRegionSize. When the JDK chooses the G1HeapRegionSize for itself, it’s called an “ergonomic” value. From the documentation we see it will be approximately the minimum Java heap size divided by 2048 and rounded up to the nearest power of two. The minimum heap size is given by the --initial-heap option on the gfsh start command or by the -Xms java command line option. 

If the ergonomic value of G1HeapRegionSize is less than twice the size of your largest GemFire region entry then you should set a larger G1HeapRegionSize. If you are unsure of the value of G1HeapRegionSize you can use the java command line option -XX:+PrintFlagsFinal to see it.

As a final check, you can use Java’s GC logging to see if your region entries are being treated as humongous objects. The HotSpot Virtual Machine Garbage Collection Tuning Guide says:

You can determine the number of regions occupied by humongous objects on the Java heap using the gc+heap=info logging. Y in the lines "Humongous regions: X->Y” give you the amount of regions occupied by humongous objects.

Force G1GC to Perform Frequent Garbage Collection

If you use GemFire’s heap LRU eviction feature with the G1 garbage collector then you need to tune the collector for compatibility, and that tuning will impact application performance and CPU utilization.

Regions with lru-heap-percentage set in the eviction-attributes will begin evicting entries when Java heap memory utilization exceeds some threshold. That threshold is set via the --eviction-heap-percentage option to the gfsh start server command, or via the eviction-heap-percentage attribute of the resource-manager element in cache XML. These settings result in what is called “heap LRU eviction”, and the value of eviction-heap-percentage is often referred to as the “eviction threshold”.

For a detailed explanation of heap LRU eviction see Managing Heap Memory. For more information on all the policy alternatives see How Eviction Works.

A garbage collector strives to identify and recycle the garbage in accordance with certain CPU, heap space, and latency tradeoffs. Over time, the amount of garbage in the heap varies. In its default configuration the G1 garbage collector allows the amount of garbage to vary widely. This wide variation foils GemFire’s heap LRU algorithm.

There are two scenarios where large amounts of garbage can foil heap LRU eviction. First, during a sudden spike of application activity, if the garbage collector doesn’t respond quickly, a large amount of garbage can build up quickly, causing region entries to be evicted unnecessarily. If the garbage collector could work a bit faster, then that eviction and subsequent access latency could be avoided.

Second, after a period of sustained application activity in which region entries are added, and perhaps some evicted, if application activity suddenly drops off, if the garbage collector sits idle, allowing a lot of garbage to stay on the heap, then that can cause region entries to be continually evicted unnecessarily.

In summary:

Scenario

Garbage Collector might…

Possible Problematic Heap LRU Eviction Result

Sudden spike in application activity

Fail to work fast enough

Premature eviction

Application inactivity after sustained activity

Sit idle

Continual eviction

 

The situation we would like to arrange is one in which the amount of garbage on the heap is bounded, regardless of application activity. This isn’t so important at low heap utilization but it becomes increasingly important as utilization increases. The ZGC and CMS algorithms have tuning settings that work well in this regard. For those two GC algorithms, the tuning settings are configured relative to GemFire’s eviction-heap-percentage setting.

Experimentation and experience with the G1 garbage collector, on the other hand, has identified no straightforward tuning approach related to eviction-heap-percentage. But by setting the -XX:G1PeriodicGCInterval it is nevertheless possible to address many application scenarios. Unlike the settings for ZGC and CMS which are relatively easy to discover, the setting for the G1 garbage collector is more sensitive to application workload. Discovering an acceptable setting for G1 requires realistic testing.

This passage from the Periodic Garbage Collections section of Oracle’s Garbage-First (G1) Garbage Collector describes the setting:

If there is no garbage collection for a long time because of application inactivity, the VM may hold on to a large amount of unused memory for a long time that could be used elsewhere. To avoid this, G1 can be forced to do regular garbage collection using the -XX:G1PeriodicGCInterval option. This option determines a minimum interval in ms at which G1 considers performing a garbage collection.

It’s easy to see how this setting might help in the scenario where the garbage collector might otherwise sit idle in periods of application inactivity. It turns out that the same setting also helps avoid premature eviction during sudden spikes in application activity.

Here are typical -XX:G1PeriodicGCInterval values that have proven useful in addressing the heap LRU eviction pitfalls described above. The time settings required to address the two problem scenarios can differ depending on the application. So you will have to pick a setting that offers a workable compromise for you.

Premature eviction during sudden spike in application activity

Problem Scenario

Value of -XX:G1PeriodicGCInterval=m to address the problem in isolation (typical)

Premature eviction during sudden spike in application activity

As low as 500 milliseconds

Continual eviction during application inactivity

1000 to 25000 milliseconds

 

As you might expect, forcing the garbage collector to do more work, more frequently will impact both application performance and CPU utilization.

Here is the sort of application performance and CPU utilization impact you can expect [1]:

Partitioned Region Benchmark

-XX:G1PeriodicGCInterval=m

Relative Throughput Decrease

Relative Latency Increase

Relative CPU Increase (loadAverage1)

Client-server PUT

500

10%

5%

12%

Peer-to-peer GET

500

10%

14%

8%

Client-server PUT

2000

1%

1%

9%

Peer-to-peer GET

2000

5%

1%

3%

Client-server PUT

10000

1%

1%

13%

Peer-to-peer GET

10000

3%

2%

0%

 

If you decide to use the G1 garbage collector on JDK 17 you must test your application and GemFire settings on realistic hardware, with representative application load. This is particularly important if you are using GemFire’s heap LRU eviction feature. The next two sections show how to use GemFire statistics to troubleshoot common problems. 

Helpful GemFire Statistics

GemFire statistics provide insight into the realtime behavior of GemFire and the garbage collector. GemFire statistics are not enabled by default. See Configuring and Using Statistics in the Tanzu GemFire Documentation for more information.

These GemFire statistics will help you assess heap LRU eviction performance in the context of G1GC:

  • Application Demand
    • CachePerfStats
      • Stats for whatever operation types your application performs such as puts/sec, or queryExecutions/sec
  • Region Entries
    • DiskRegionStatistics: if you have configured the “overflow to disk” eviction action then it is very easy to compare the number of region entries in memory vs the number evicted (to disk). 
      • entriesInVM
      • entriesOnlyOnDisk
  • Resource Manager (heap LRU eviction)
    • ResourceManagerStats
      • criticalThreshold
      • evictionThreshold
      • evictionStartEvents: if you have configured “local destroy” eviction action (instead of “overflow to disk”) then you can use eviction start/stop events to see when eviction start and stops, but you won’t be able to directly count evicted entries
      • evictionStopEvents
  • Heap
    • VMStats
      • maxMemory
  • G1 Garbage Collector
    • VMMemoryPoolStats “G1OldGen-Heapmemory”
      • currentUsedMemory
    • VMGCStats “G1YoungGeneration”, “G1OldGeneration”
      • collections/sec

 

Additional Information

[1] Test run on a single 32-Intel-core host with 64GB RAM. Client-server tests were two client JVMs and one server JVM. Peer-to-peer tests ran two server JVMs. All server JVMs configured for 24GB heap.