GemFire Client Receives "Socket Timeout" Exceptions After Server Removal
search cancel

GemFire Client Receives "Socket Timeout" Exceptions After Server Removal

book

Article ID: 414065

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

GemFire client applications continue to report warnings like the following even after a server node (e.g., serverX) is removed from the cluster:

WARN [org.apache.geode.cache.client.internal.ConnectionFactoryImpl] node1 poolTimer-DEFAULT - Could not connect to serverX java.net.SocketTimeoutException: connect timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

Environment

All supported GemFire Versions

Cause

This warning occurs because the GemFire client is still attempting to connect to serverX, which has already been removed from the cluster. The root causes commonly include outdated client pools, obsolete locator configurations, stale DNS entries, or application logic retaining legacy endpoints.

Resolution

This is a known issue and will be resolved in a future release of Tanzu GemFire. Subscribe to this article to receive updates. Before the fix is available, follow the steps below to resolve persistent socket timeout errors after a server is removed:

  1. Verify Client Configuration for any hardcoded references to serverX in the pool settings.
  2. Clear Client Pool Configuration:
    • The client may have cached the old server list. Force the client to refresh its connection pool by restarting the client application or explicitly invalidating the pool.
  3. Check Locator Configuration:
    • If a locator still references ServerX, restart the locator or update its configuration to remove the stale entry.
  4. Inspect Client-Side Pool Settings for any misconfiguration
  5. Force Refresh of Client-Server Connections:
  6. Check for Stale DNS or Hostname Resolution
  7. Update Client Libraries:
    • Ensure the client is using a compatible version of the GemFire client libraries 
  8. Check for Application Issues:
    • If the client application programmatically specifies server endpoints (e.g., via PoolFactory.addServer()), verify that ServerX isn’t hardcoded in the application code.
    • Review the application logic to ensure it’s not caching server endpoints independently of GemFire’s configuration.

Additional Information

  • SocketTimeoutExceptions may persist for a short period after server removal until client pools are refreshed or the application is restarted.
  • For distributed systems, multiple locators must be checked for configuration drift or inconsistent cluster views.
  • Regularly review client and locator configurations during cluster maintenance windows to prevent recurrence.