VMware NSX for vSphere 6.x controller is excluded from the cluster with the message: Zookeeper client disconnected
book
Article ID: 339199
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
A deployed VMware NSX for vSphere 6.x controller disconnects from a controller cluster.
TCP listeners applicable to a functioning NSX controller, no longer appear in the output of the show network connections of-type tcp command
The NSX controller logs contains entries similar to:
D0525 13:46:07.185200 31975 rpc-broker.cc:369] Registering address resolution for: 20.x.x.x:7777 D0525 13:46:07.185246 31975 rpc-tcp.cc:548] Handshake complete, both peers support the same protocol. D0525 13:46:07.197654 31975 rpc-tcp.cc:1048] Rejecting a connection from peer 10.x.x.x:42195/ef447643-xxxx-xxxx-xxxx-35630df39060, cluster 9f7ea8ff-xxxx-xxxx-xxxx-628e834aa8a5, which doesn't match our cluster (00000000-0000-0000-0000-000000000000). D0525 13:46:07.222869 31975 rpc-tcp.cc:1048] Rejecting a connection from peer 10.x.x.x:42195/ef447643-xxxx-xxxx-xxxx-35630df39060, cluster 9f7ea8ff-xxxx-xxxx-xxxx-628e834aa8a5, which doesn't match our cluster (00000000-0000-0000-0000-000000000000)
Running the show log cloudnet/cloudnet_java-zookeeper*.log command in the NSX Controller console contains entries similar to:
cloudnet_java-zookeeper.20150530-000550.1806.log-2015-05-30 13:25:07,382 47956539 [SyncThread:1] WARN org.apache.zookeeper.server.persistence.FileTxnLog - fsync-ing the write ahead log in SyncThread:1 took 3219ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide.
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware NSX for vSphere 6.2.x VMware NSX for vSphere 6.1.x
Cause
This issue occurs due to slow disk performance, which adversely impacts the NSX controller cluster. The controller zookeeper process handles all I/O events in a single thread. If file write operations are consuming resources, controller keep-alive messages may be starved.