The GemFire application appears to block indefinitely while reading the error response from the GemFire cluster.
search cancel

The GemFire application appears to block indefinitely while reading the error response from the GemFire cluster.

book

Article ID: 423365

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

A GemFire client experienced a rapid increase in blocked operations, causing most requests to queue while waiting for responses from the cluster. This behavior was observed following suspected network connectivity issues and was accompanied by non-heap memory growth in the client application process.

Environment

VMware Tanzu GemFire 10.1.0

Cause

This issue can be triggered by non-heap memory growth, typically in direct buffers or threads, leading to memory pressure and eventual out-of-memory conditions in the client JVM. There is also a known product issue where a critical GemFire client thread becomes stuck while writing responses back to the client, causing other client operations that need to replicate information to cluster members to hang behind it.​

Resolution

A future GemFire 10.1 patch release is planned to include resiliency improvements so that multi-hop functions do not return exceptions or block on critical client threads. Subscribe to this Knowledge Base article to receive updates on the availability and timeline of this patch.​

Workaround

Until the patch is available, use the following workarounds:

  • Restart affected client-side applications connected to the GemFire cluster to clear stuck threads and reclaim non-heap memory.​
  • When possible, perform controlled rolling restarts of clients to minimize impact on traffic.​

Recommendations

When a Java process encounters an OutOfMemoryError, the JVM is in an inconsistent state, and the process must be terminated immediately. Configure one of the following JVM flags on all GemFire client processes at startup:​

  • -XX:+CrashOnOutOfMemoryError – exits the JVM and produces core/crash files, if enabled.​
  • -XX:+ExitOnOutOfMemoryError – exits the JVM when an OutOfMemoryError is thrown.​

To further harden GemFire client processes:

  • Reduce OS ulimit values and/or use cgroups (or equivalent container limits) to enforce CPU and memory guardrails on client processes.​
  • Add -XX:+ExitOnOutOfMemoryError when starting the client JVM process to ensure the process will exit if it runs out of memory.​
  • Configure function and operation timeouts so the GemFire cluster can detect and disconnect unresponsive clients instead of allowing operations to block indefinitely.​

Additional Information

To help prevent recurrence of this behavior:

  • Ensure all functions running on the GemFire cluster are configured with appropriate timeouts so that long-running or stuck operations can be aborted.​
  • Monitor the health of the host and application (CPU, memory, including non-heap, GC, and thread counts) and respond promptly to deviations or recurring exceptions.​
  • Run GemFire client applications with JVM parameters that force exit on OutOfMemoryError to avoid running in an inconsistent memory state, for example, XX:+ExitOnOutOfMemoryError (optionally combined with diagnostic flags such as heap dumps).​