Client Timeout and Duplicate Event Messages in GemFire

search cancel

book

calendar_today

VMware Tanzu Gemfire

GemFire clients may disconnect from servers or experience retries when performing region operations such as PutAll, RemoveAll etc.

Typical symptoms in the logs include:

Client timeouts / forced disconnection by server

Slow server response detection

 [warn ... <Timer-0>] 15 seconds have elapsed waiting for a response from … for .. thread .. ServerConnection ..

Event replay warnings

Client-side retries / failures

This behavior generally occurs when clients are unable to process server events fast enough. Contributing factors include:

Very small async queue size (async-max-queue-size=8) , leading to queue overflow under heavy load.
No async distribution timeout (async-distribution-timeout=0) , slow receivers are never timed out.
Low client timeout threshold (default 10s) , clients disconnect during transient delays.
Unresponsive clients not removed (remove-unresponsive-client=false) , holding server resources.
Large batch operations (PutAll, RemoveAll) increasing processing latency.

To mitigate these issues, apply the following changes:

Increase client socket/connection timeouts to accommodate GC pauses or network delays.

Please Note: These changes are iterative in nature and would require thorough testing.

thumb_up Yes

thumb_down No