When switching GemFire/Geode clients from Cluster 1 to Cluster 2 using a Load Balancer (LB) change without restarting the client application there is a high risk of PDX Type Mismatch or Unknown PDX Type errors. This occurs because clients maintain a local memory of PDX metadata that may not align with the new cluster.
Gemfire
A specific race occurs during the transition if a new type is introduced:
A new PDX type (e.g., a new domain object) is defined and used in Cluster 1.
Cluster 1 is shut down before the metadata replicates over the WAN to Cluster 2.
The client switches to Cluster 2 and sends the serialized object.
Cluster 2 lacks the metadata for this new type and cannot deserialize the data.
Clients cache PDX mappings (e.g., ID 101 = MyObject) in their local memory. If the client is not "reset" during the switch:
It may try to use an ID from Cluster 1 that does not exist in Cluster 2.
It may use an ID that Cluster 2 has already assigned to a completely different object, leading to data corruption.
If the Load Balancer does not perform a "hard cut," a client might briefly be connected to both clusters.
Risk: The client could put data to one cluster and get from the other. If WAN replication has not finished, the data will be missing or inconsistent.
The most effective way to avoid these issues is to ensure the client clears its PDX registry when it loses connection to Cluster 1. This forces it to re-negotiate all IDs with Cluster 2.
GemFire 10.2+: Clients clear the registry on disconnect by default.
Older Clients: Must use the system property: -Dgemfire.ON_DISCONNECT_CLEAR_PDXTYPEIDS=true
To safely switch clusters via a Load Balancer pointing to new locators/servers:
Shutdown Cluster 1: Terminate the servers to force the client sockets to close.
Wait for Disconnect: Allow a period of time for the clients to realize the servers are gone. This triggers the Registry Clear logic.
Switch Load Balancer: Point the LB to the Cluster 2 locators.
Client Reconnect: The client connects to Cluster 2 with a clean slate, avoiding ID collisions.
Note: If the registry is cleared between the shutdown of Cluster 1 and the first write to Cluster 2, the race condition mentioned in Scenario 1 is neutralized.