Gemfire with ZGC Top Command
search cancel

Gemfire with ZGC Top Command

book

Article ID: 436929

calendar_today

Updated On:

Products

Pivotal GemFire VMware Tanzu Gemfire

Issue/Introduction

The Linux top command is a common tool and widely used, but does not report Gemfire ZGC memory usage accurately; which can lead users to believe members are using more memory than they're configured for.

This article clarifies the high CPU and memory utilization levels observed in monitoring tools, like the Linux top command, when running GemFire with the Z Garbage Collector (ZGC). These readings are not indicators of resource exhaustion but are expected reporting artifacts of the Java ZGC configuration.

Cause

Standard monitoring tools like top are not "ZGC-aware", leading them to significantly over-report memory consumption. This over-reporting is an inherent artifact of how ZGC manages its memory footprint.

Misleading Virtual Memory (VIRT)

Virtual Memory (VIRT) are simply a reservation of virtual address space, not a consumption of physical RAM. ZGC uses 64-bit "colored pointers" to store object metadata, which requires it to map the same physical heap to multiple large virtual address ranges (Marked0, Marked1, and Remapped). On most 64-bit Linux systems, ZGC automatically reserves virtual address space to accommodate these multiple "views," which has zero impact on actual hardware capacity.

Misleading Percentage Memory (%MEM)

%MEM memory usage occur because tools like top see these three identical virtual views and sum them together. For example, on a 525GB heap running on a 1.5TB host, top essentially sums the three views to see 1.57TB of usage, resulting in a reading greater than 100%.

High CPU Utilization (Expected Initialization)

Also worth noting, a CPU spike during startup is expected if using the JVM "-XX:+AlwaysPreTouch" option. Instead of waiting for the application to request memory, the Java Virtual Machine (JVM) proactively "touches" every page of the heap during startup to zero it out and ensure it is backed by physical RAM. This temporary CPU surge is a "one-time tax" that prevents unpredictable latency "hiccups" when accessing memory for the first time during production load.

Resolution

Core Rule: Tools like the top command that report RSS (Resident Set Size) will over-report ZGC memory usage. PSS (Proportional Set Size) is the only accurate metric for determining the actual physical RAM footprint on the hardware.

MetricDefinitionImportance for GemFire
USSUnique Set SizeMemory used exclusively by the Java process.
 
PSSProportional Set SizeThe Truth. The actual physical RAM footprint on the hardware.
 
RSSResident Set SizeThe Artifact. Multi-counts the heap; should be ignored for ZGC.
 

Option A: Using smem (Recommended)

If smem is installed, run the following command to see the actual physical footprint (PSS) of all running processes:

sudo smem -rtk

Option B: Manual PSS Check (No installation required)

To pull the PSS directly from the Linux kernel for a specific PID (process ID), use:

grep '^Pss:' /proc/[PID]/smaps_rollup

To view the PSS for all running processes on the host at once:

for pid in /proc/[0-9]*; do 
    if [ -f "$pid/smaps" ]; then 
        echo -n "$(basename $pid) $(cat $pid/comm): "
        grep '^Pss:' $pid/smaps | awk '{sum += $2} END {print sum " kB"}'
    fi
done | sort -nk3