Transport Nodes lose Controller connection when an NSX-T Manager is added/deleted
search cancel

Transport Nodes lose Controller connection when an NSX-T Manager is added/deleted

book

Article ID: 318323

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX-T Data Center 3.2.0/3.2.0.1
  • An NSX-T Manager has been added or removed from the management cluster
  • Transport Nodes, Hosts or Edges, may show Controller Connectivity down on the NSX UI
  • On an ESX host, similar logging to this example may be seen in /var/run/log/nsx-syslog.log
2022-03-09T13:56:40Z nsx-proxy: NSX 2100995 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" tid="2100995" level="ERROR" invalid="true"] VersionMastershipHandshakeClient: received MasterResponse UUID {<UUID4>} not in {<UUID1>, <UUID2>, <UUID3>}
  • The following behaviour may be observed
Prior to making any changes the Transport Node connects to 3 Managers and 1 Controller which is expected behaviour.

(from nsxcli shell)

> get managers 
- 192.168.1.10     Connected (NSX-RPC) *
- 192.168.1.11     Connected (NSX-RPC) 
- 192.168.1.12     Connected (NSX-RPC) 

> get controllers 
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN 
  192.168.1.10    1235   enabled     connected             true               up               NA
  192.168.1.11    1235   enabled      not used            false              null              NA       
  192.168.1.12    1235   enabled      not used            false              null              NA


In this example, a new Manager 192.168.1.13 is added to the cluster for the purpose of replacing an existing node.
The new Manager is reflected on the Transport Node connections

> get managers 
- 192.168.1.10     Connected (NSX-RPC) *
- 192.168.1.11     Connected (NSX-RPC) 
- 192.168.1.12     Connected (NSX-RPC) 
- 192.168.1.13     Connected (NSX-RPC)        <<<< New Manager


However the new Manager IP is missing from Controller connections and Controller connection is now down

> get controllers 
Thu Mar 10 2022 UTC 18:05:38.605
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN 
  192.168.1.10    1235   enabled     disconnected          true              down              NA 
  192.168.1.11    1235   enabled      not used            false              null              NA       
  192.168.1.12    1235   enabled      not used            false              null              NA


New Controller information has not been pushed to the Transport Node

(From root shell)

#egrep "server|fqdn" /etc/vmware/nsx/controller-info.xml
                        <server>192.168.1.10</server>
                        <server>192.168.1.11</server>
                        <server>192.168.1.12</server>


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 3.x

Cause

One Controller is responsible for handling the addition or deletion of a Controller to the cluster.
This issue occurs because this Controller only sends the new list of Controllers to the Transport Nodes connecting to it. Transport Nodes sharded to other Controllers do not get the updated list of Controllers and so lose their Controller connectivity.

Resolution

This issue is resolved in NSX-T Data Center 3.2.1, available at Broadcom support & Downloads .

Workaround:
To workaround this issue, each time a Manager is added or removed from the Controller cluster the nsx proxy service must be restarted on the impacted Transport Nodes.
The service restart will repopulate controller-info.xml and allow the Controller connection to come up.

From the Transport Node root shell:

#/etc/init.d/nsx-proxy restart

Confirm the Controller file is populated with the correct Manager IPs

#egrep "server|fqdn" /etc/vmware/nsx/controller-info.xml

Refresh the UI and confirm the TN status is healthy.


If the above workaround does not resolve the issue. Please collect the following support bundles and open a case with Broadcom support.
1) NSX manager support bundle
2) Host & edge support bundles.

Please refer the below document for collecting the support bundle.
NSX Administration Guide / Collect Support Bundles