GemFire: JVM Pauses and Cluster Degradation Triggered by Thread Monitoring in GemFire with Azul Zulu and Zing
search cancel

GemFire: JVM Pauses and Cluster Degradation Triggered by Thread Monitoring in GemFire with Azul Zulu and Zing

book

Article ID: 396045

calendar_today

Updated On:

Products

VMware Tanzu Data Suite

Issue/Introduction

GemFire cluster experience JVM pauses across multiple nodes without kicking the member out.

Environment

GemFire 10.1.x with Azul Zulu (an OpenJDK-based distribution) and Zing

Cause

When a thread becomes stuck or takes an extended time to complete (e.g., during authentication or classloading), GemFire’s Thread Monitor may activate to collect diagnostic data. In version 10.1, this includes capturing thread lock information if the JVM vendor is Azul. This triggers safepoint operations that can cause significant JVM pauses, which can degrade overall performance and cluster stabilty.

This can occur even with the significant improvements in Azul Zing.

Resolution

 

1.  Upgrade to  Gemfire version 10.2.0 which has significant improvements to the thread monitor's impact, as described below.

  • GEM-14724 (Performance Fix): Addressed a critical issue where serial collection of thread information—specifically lock-related data—caused significant JVM pauses and cluster instability. Improvements included:
    • Switching to batch-based thread info retrieval.
    • Disabling the collection of lock-related data by default Release Notes (10.2).
  • GEM-14891 (Back-off Protocol): Added a "back-off protocol" for ThreadMonitor calls that exceed a threshold. This prevents the monitor from continuously hammering a system that is already under heavy load, reducing its potential to worsen existing performance issues Release Notes (10.2).

 

2. If upgrading to Gemfire version 10.2.0 doesn't address the issue, you can disable thread lock collection on all nodes using Azul Zulu or Zing by adding the JVM property, -Dgemfire.threadmonitor.showLocks=false.This reverts the behavior to a less intrusive monitoring mode, consistent with earlier GemFire versions.

3. The most conservative approach is to disable thread monitoring completely.