Tanzu GemFire Client Operations Using Virtual Threads are Not Supported
search cancel

Tanzu GemFire Client Operations Using Virtual Threads are Not Supported

book

Article ID: 431535

calendar_today

Updated On:

Products

VMware Tanzu Data Intelligence VMware Tanzu Gemfire VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

When utilizing Java 21 Virtual Threads for Tanzu GemFire client applications, operations can experience severe data inconsistencies, silent failures (such as remove() or destroy operations failing to execute despite returning a success status), application hangs, and cluster-wide thread starvation.

While Tanzu GemFire is fully validated to run on the JDK 21 runtime environment, the Virtual Thread concurrency model is explicitly not supported.

Environment

 

  • Tanzu GemFire with 

    • JDK 21 (with Virtual Threads enabled on the client application framework, e.g., Spring Boot)

    • Java Client / Spring Boot Application

 

 

Cause

Whenever code enters a synchronized block or performs a blocking network operation, the virtual thread becomes pinned to its underlying carrier thread. Since the default number of carrier threads is equal to the number of CPU cores, these threads can be quickly exhausted during I/O-intensive operations. This can lead to thread starvation, deadlocks or hangs, and ultimately cluster instability.

This behavior is due to an architectural incompatibility between Java 21 Virtual Threads and GemFire’s client thread-tracking mechanism.

  1. Thread ID Counter Wrap-Around: GemFire clients track and assign internal threadID namespaces using an AtomicLong counter that wraps around at 1,000,000. Because a high-throughput application (e.g., thousands of requests per second) spawns a new Virtual Thread for every request, this entire 1,000,000 ID space can be completely exhausted and wrapped in minutes.

  2. State Retention Collision: The GemFire server's DistributedEventTracker intentionally retains thread state histories for 5 minutes (controlled by message-tracking-timeout) to ensure exactly-once semantics during standard client retries. Because the Virtual Threads may wrap the counter faster than the 5-minute expiration window, the server can still carry the old threadID active in memory.

  3. Duplicate Rejection (Silent Dropping): When a new Virtual Thread reuses an old threadID and sends a low sequence number operation, the server compares it to the previously tracked high sequence number (sequenceID < highestSequenceNumberSeen). The server evaluates this as a duplicate retry operation, skips the execution (e.g., dropping the delete request), and returns a SUCCESS status back to the client.

Few Symptoms & Log Evidence from Server-Side Logs:


Look for the following log patterns on your GemFire servers, which indicate that the EventTracker is actively dropping incoming events as duplicates:

No version tag can be found in cluster when retrying the following event: [id=<some_id>;seq=<low sequence number>;op=<some_op>]
EventTracker: Operation dropped as duplicate for threadID=<some_id>, sequenceID=<low_sequence_number> (highestSequenceNumberSeen=<very_high_sequence_number)

For example : 

  • some_id = 905114
  • low sequence number = 4
  • some_op = DESTROY
  • very_high_sequence_number = 50621

Resolution

Recommendations:

  • Configure your Spring Boot or Java client applications to use standard Platform Threads (bounded thread pools) instead of Virtual Threads for all GemFire client cache operations.

     

  • Aligning your concurrency model with standard thread pools prevents rapid thread ID recycling and keeps the counter well within the 1,000,000 limit over the lifetime of the application threads.

Risky Short-Term Workaround

If Virtual Threads cannot immediately be disabled, you can reduce the server's tracking timeout to be shorter than the time it takes your client application to wrap the 1,000,000 thread ID counter:

  • Property: message-tracking-timeout (Default is 300000 ms / 5 minutes)

Warning: Shrinking this duplicate detection window severely compromises GemFire's ability to filter out legitimate retries during network hiccups, potentially breaking exactly-once processing guarantees for non-idempotent operations.

Additional Information

Tanzu GemFire Java Support DocumentationConfirms that while GemFire runs on JDK 21, it does not currently support Virtual Threads.