VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable

search cancel

VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable

book

Article ID: 326352

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable.

NSX Host preparation health check shows a Warning for the connection between the ESXi hosts managed by the Secondary NSX Manager and the NSX controller.

ESXi hosts managed by the Secondary NSX Manager are not connected to the NSX controllers

#esxcli network ip connection list | grep 1234

Example of expected output:
tcp 0 0 192.168.110.51:32071 192.168.121.33:1234 ESTABLISHED 2101695 newreno netcpa
tcp 0 0 192.168.110.51:12212 192.168.121.31:1234 ESTABLISHED 2101695 newreno netcpa
tcp 0 0 192.168.110.51:52938 192.168.121.32:1234 ESTABLISHED 2101695 newreno netcpa

On the ESXi hosts managed by the Secondary NSX Manager, the file /etc/vmware/netcpa/config-by-vsm.xml does not contains NSX controllers information.

#grep -E 'connectionList|server|port' /etc/vmware/netcpa/config-by-vsm.xml
<connectionList>
</connectionList>

Example of expected output:
<connectionList>
<server>192.168.121.31</server>
<port>1234</port>
<server>192.168.121.32</server>
<port>1234</port>
<server>192.168.121.33</server>
<port>1234</port>
</connectionList>

Secondary NSX Manager logs (vsm.log) display similar messages indicating the NSX Manager Service restarted:

vsm.log.1:2019-07-12 10:55:18.757 CEST INFO localhost-startStop-2 VsmServletContextListener:75 - NSX Status : STOPPED
vsm.log.1:2019-07-12 10:56:03.588 CEST INFO localhost-startStop-1 VsmServletContextListener:75 - NSX Status : STARTING
vsm.log.1:2019-07-12 10:58:41.847 CEST INFO localhost-startStop-1 VsmServletContextListener:75 - NSX Status : RUNNING

Secondary NSX Manager logs (nsx-wrapper.log) display messages similar to the messages below at the time the NSX Manager Service restarted:

INFO | jvm 1 | 2019/07/12 10:55:12 | WrapperManager Error: Found 2 deadlocked threads!
STATUS | wrapper | 2019/07/12 10:55:12 | A Thread Deadlock was detected. Restarting JVM

Environment

VMware NSX Data Center for vSphere 6.4.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

This issue is due to a race condition which cause the secondary NSX Manager to enter an inconsistent state and NSX controller information is not pushed to ESXi hosts managed by the Secondary NSX Manager.

Resolution

This issue is resolved in VMware NSX for Data Center 6.4.5.

Workaround:
To workaround the issue you can either:

Delete one NSX controller on the Primary NSX Manager and redeploy a new Controller.
Remove Secondary Role from the Secondary NSX Manager and re-add it as Secondary.

Both workarounds will clear the inconsistent state of the Secondary NSX Manager and the file /etc/vmware/netcpa/config-by-vsm.xml on the ESXi hosts managed by the Secondary NSX Manager will be updated with the NSX controllers information and the connections to the NSX controllers will be restored.

Verify on the ESXi host using:
#grep -E 'connectionList|server|port' /etc/vmware/netcpa/config-by-vsm.xml
#esxcli network ip connection list | grep 1234

Feedback

thumb_up Yes

thumb_down No