VMware NSX for vSphere 6.x controller is excluded from the cluster with the message: Zookeeper client disconnected
search cancel

VMware NSX for vSphere 6.x controller is excluded from the cluster with the message: Zookeeper client disconnected

book

Article ID: 339199

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • A deployed VMware NSX for vSphere 6.x controller disconnects from a controller cluster.
  • TCP listeners applicable to a functioning NSX controller, no longer appear in the output of the show network connections of-type tcp command
  • The NSX controller logs contains entries similar to:

    D0525 13:46:07.185200 31975 rpc-broker.cc:369] Registering address resolution for: 20.x.x.x:7777
    D0525 13:46:07.185246 31975 rpc-tcp.cc:548] Handshake complete, both peers support the same protocol.
    D0525 13:46:07.197654 31975 rpc-tcp.cc:1048] Rejecting a connection from peer 10.x.x.x:42195/ef447643-xxxx-xxxx-xxxx-35630df39060, cluster 9f7ea8ff-xxxx-xxxx-xxxx-628e834aa8a5, which doesn't match our cluster (00000000-0000-0000-0000-000000000000).
    D0525 13:46:07.222869 31975 rpc-tcp.cc:1048] Rejecting a connection from peer 10.x.x.x:42195/ef447643-xxxx-xxxx-xxxx-35630df39060, cluster 9f7ea8ff-xxxx-xxxx-xxxx-628e834aa8a5, which doesn't match our cluster (00000000-0000-0000-0000-000000000000)


    For more information, see the Collecting diagnostic information for VMware NSX for vSphere 6.x (2074678).
  • The disconnected controller attempts to join the cluster using an all-zeroes UUID, which is not valid.
  • The show control-cluster history command displays a message similar to:

    INFO.20150530-000550.1774:D0530 13:25:29.452639 1983 zookeeper_client.cc:774] Zookeeper client disconnected!
  • Running the show log cloudnet/cloudnet_java-zookeeper*.log command in the NSX Controller console contains entries similar to:

    cloudnet_java-zookeeper.20150530-000550.1806.log-2015-05-30 13:25:07,382 47956539 [SyncThread:1] WARN org.apache.zookeeper.server.persistence.FileTxnLog - fsync-ing the write ahead log in SyncThread:1 took 3219ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide.

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.1.x

Cause

This issue occurs due to slow disk performance, which adversely impacts the NSX controller cluster. The controller zookeeper process handles all I/O events in a single thread. If file write operations are consuming resources, controller keep-alive messages may be starved.

Resolution

Caution: NSX Controllers are storage sensitive.

VMware strongly recommends that you deploy NSX Controllers on low-latency disks. For more information, see Troubleshooting storage issues when using VMware products (2013160).



Additional Information