VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable
search cancel

VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable

book

Article ID: 326352

calendar_today

Updated On:

Products

VMware NSX Data Center for vSphere

Issue/Introduction

Symptoms:
  • VMs running on ESXi hosts managed by the Secondary NSX Manager become unreachable.
  •  NSX Host preparation health check shows a Warning for the connection between the ESXi hosts managed by the Secondary NSX Manager and the NSX controller. 
  • ESXi hosts managed by the Secondary NSX Manager are not connected to the NSX controllers
    • #esxcli network ip connection list | grep 1234
      • Example of expected output:
      • tcp         0       0  <IP Address>:32071              <NSX manager1 IP>:1234   ESTABLISHED   2101695  newreno  netcpa
        tcp         0       0  <IP Address>:12212              <NSX manager2 IP>:1234   ESTABLISHED   2101695  newreno  netcpa
        tcp         0       0  <IP Address>:52938              <NSX manager3 IP>:1234   ESTABLISHED   2101695  newreno  netcpa
  • On the ESXi hosts managed by the Secondary NSX Manager, the file /etc/vmware/netcpa/config-by-vsm.xml does not contains NSX controllers information.
    • grep -E 'connectionList|server|port' /etc/vmware/netcpa/config-by-vsm.xml
              <connectionList>
              </connectionList>
      
      Example of expected output:
              <connectionList>
                              <server><NSX manager1 IP></server>
                              <port>1234</port>
                              <server><NSX manager2 IP></server>
                              <port>1234</port>
                              <server><NSX manager3 IP></server>
                              <port>1234</port>
              </connectionList>

 
  • Secondary NSX Manager logs (vsm.log) display similar messages indicating the NSX Manager Service restarted:
    • sm.log.1:2019-07-12 10:55:18.757 CEST  INFO localhost-startStop-2 VsmServletContextListener:75 - NSX Status : STOPPED
      vsm.log.1:2019-07-12 10:56:03.588 CEST  INFO localhost-startStop-1 VsmServletContextListener:75 - NSX Status : STARTING
      vsm.log.1:2019-07-12 10:58:41.847 CEST  INFO localhost-startStop-1 VsmServletContextListener:75 - NSX Status : RUNNING
  • Secondary NSX Manager logs (nsx-wrapper.log) display messages similar to the messages below at the time the NSX Manager Service restarted:
    • INFO   | jvm 1    | 2019/07/12 10:55:12 | WrapperManager Error: Found 2 deadlocked threads!
      STATUS | wrapper  | 2019/07/12 10:55:12 | A Thread Deadlock was detected.  Restarting JVM

Environment

VMware NSX Data Center for vSphere 6.4.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

This issue is due to a race condition which cause the secondary NSX Manager to enter an inconsistent state and NSX controller information is not pushed to ESXi hosts managed by the Secondary NSX Manager.

Resolution

This issue is resolved in VMware NSX for Data Center 6.4.5.

Workaround:
To workaround the issue you can either:
  • Delete one NSX controller on the Primary NSX Manager and redeploy a new Controller.
  • Remove Secondary Role from the Secondary NSX Manager and re-add it as Secondary.
Both workarounds will clear the inconsistent state of the Secondary NSX Manager and the file /etc/vmware/netcpa/config-by-vsm.xml on the ESXi hosts managed by the Secondary NSX Manager will be updated with the NSX controllers information and the connections to the NSX controllers will be restored.

Verify on the ESXi host using:
#grep -E 'connectionList|server|port' /etc/vmware/netcpa/config-by-vsm.xml
#esxcli network ip connection list | grep 1234