A GemFire client experienced a rapid increase in blocked operations, causing most requests to queue while waiting for responses from the cluster. This behavior was observed following suspected network connectivity issues and was accompanied by non-heap memory growth in the client application process.
VMware Tanzu GemFire 10.1.0
This issue can be triggered by non-heap memory growth, typically in direct buffers or threads, leading to memory pressure and eventual out-of-memory conditions in the client JVM. There is also a known product issue where a critical GemFire client thread becomes stuck while writing responses back to the client, causing other client operations that need to replicate information to cluster members to hang behind it.
A future GemFire 10.1 patch release is planned to include resiliency improvements so that multi-hop functions do not return exceptions or block on critical client threads. Subscribe to this Knowledge Base article to receive updates on the availability and timeline of this patch.
Workaround
Until the patch is available, use the following workarounds:
Recommendations
When a Java process encounters an OutOfMemoryError, the JVM is in an inconsistent state, and the process must be terminated immediately. Configure one of the following JVM flags on all GemFire client processes at startup:
To further harden GemFire client processes:
To help prevent recurrence of this behavior: