NSX install on ESXi Transport Node fails at 48% with error "Waiting for Connection to Managers"
search cancel

NSX install on ESXi Transport Node fails at 48% with error "Waiting for Connection to Managers"

book

Article ID: 397536

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You are trying to prepare newly added ESXi's in the cluster for NSX
  • All the newly added ESXi's report the same error - "Waiting for Connection to Managers" and fails for NSX installation at 48%
  • Below is the screenshot of the error you notice during the installation 



  • Both ESXi and NSX Manager are on different subnets with physical firewall between them.
  • When running netcat command on port 1234 and 1235, the port connectivity works fine, pings works fine and you are able to resolve IP and FQDN of the manager, but when you run the command "esxcli network ip connection list", you see ports connectivity in TIME_WAIT state instead of ESTABLISHED and when you run the command "nsxcli -c get managers", you see all 3 managers in Standby state instead of Connected

    Below is the screenshot of the port connectivity checks and other basic networking checks working fine.



  • Under nsx-syslog on the ESXi, you see below logs suggesting connection reset by peer on port 1234

    <Date and Time> nsx-proxy[2####26]: NSX 2####26 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2####26" level="WARNING"] StreamConnection[1178 Error to ssl://10.22.X.X:1234 sid:-1] Error 104-Connection reset by peer
    <Date and Time> nsx-proxy[2####26]: NSX 2####26 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2####26" level="WARNING"] RpcConnection[1178 Connecting to ssl://10.22.X.X:1234 0] Couldn't connect to ssl://10.22.X.X:1234 (error: 104-Connection reset by peer)

  • During the actual TCP session establishment on port 1234 between the ESXi and the managers, there are RST packets seen from both ESXi and manager side as soon as TCP handshake is completed i.e., SYN >> SYN-ACK >> ACK >> RST
  • You are noticing on the ESXi packet capture that manager sends the RST and when captured on manager, you see RST packets are sent from ESXi, but in real neither the ESXi nor the manager are initiating the RST packets but are being initiated by some other device.

    Packet captures taken on problem host at uplink level wherein we see RST sent from the manager side (10.22.X.X):



    Packet captures taken on the manager VnicTx,VnicRx side wherein we see RST are sent from host side (10.64.x.x):



    Note: Please observe that though both ESXi and Manager are receiving RST packets, they are not being generated from both sides.

Environment

VMware NSX 
VMware NSX-T Datacenter

Cause

Potential/Likely cause of the issue - Intermediatory device in physical infra is generating RST packets on behalf of the ESXi and NSX Manager and the physical firewall might be having IDPS enabled and there are no rules defined to allow the connection for port 1234 between the ESXi and NSX Manager

Resolution

Based on the packet captures observations, investigate physical networking layer as to what intermediatory device is interacting with both ESXi and NSX Manager and fix the identified cause