"Time out waiting for host to join NSX Manager" - NSX Host install or upgrade fails at 48% due to ports blocked
search cancel

"Time out waiting for host to join NSX Manager" - NSX Host install or upgrade fails at 48% due to ports blocked

book

Article ID: 398453

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • ESXi host install fails with "Time out waiting for host to join NSX Manager"
  • ESXi host major version upgrade from 7 to 8, this transport node's status on NSX is 'Installed Failed'
  • When we click on 'Installed Failed' state, we can see its stuck in 'Waiting for connection to Managers'
  • In NSX manager /var/log/proton/nsxapi.log:

892126:2025-05-13T20:20:55.930Z  INFO UfoIndexer-BatchExecutor-search_policy-1 TransportNodeStatusServiceImpl 76576 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Management connection status for node ######-#######-####### is DOWN


2025-05-13T20:22:18.453Z  INFO ActivityWorkerPool-1-9 Esx60SfdmManager 76576 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Heart beat status DOWN for host ######-#######-#######
2025-05-13T20:22:18.453Z  INFO ActivityWorkerPool-1-9 Esx60SfdmManager 76576 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Waiting for heartbeat status for host ######-#######-#######, number of tries 48
2025-05-13T20:22:18.453Z ERROR ActivityWorkerPool-1-9 Esx60SfdmManager 76576 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26019" level="ERROR" subcomp="manager"] Time out waiting for host to join NSX Manager.
2025-05-13T20:22:18.454Z ERROR ActivityWorkerPool-1-9 DeploymentUnitActivityInstall 76576 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26050" level="ERROR" subcomp="manager"] Host prep failed for ######-#######-#######
2025-05-13T20:22:18.556Z  INFO UfoIndexer-BatchExecutor-search_manager-0 HostPrepServiceFabricDeploymentServiceImpl 76576 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Status is INSTALL_FAILED but Heartbeat is down so returning HOST_DISCONNECTED status for node : ######-#######-#######
  • A port connectivity test between ESX and NSX Managers fails for one or all required ports:

nc -v <NSX-Mgr-IP> 1234
nc -v <NSX-Mgr-IP> 1235
nc -v <NSX-Mgr-IP> 443

A working system will return success similar to this example

#nc -v mgr01.example.com 443
Connection to mgr01.example.com 443 port [tcp/https] succeeded!

  • On rare occasions, the above testing of port 443 could succeed, but there may still be a content-aware FW rule blocking TLS/SSL to this port.  This type of rule will still put the host transport node into an "Install Failed " status.  Running the below command should uncover whether a content-aware FW rule is in fact blocking TLS communication:  

    #wget -S https://mgr01.example.com/

    A working system will return data similar to the below (some of the data left out of the server response): 

    #wget -S https://mgr01.example.com/
    Connecting to mgr01.example.com (192.168.#.#:443)
      HTTP/1.1 302 Found
      set-cookie: JSESSIONID=6B79####8F93####546C####B7F2###; Path=/; Secure; HttpOnly; SameSite=Lax
    Connecting to mgr01.example.com (192.168.#.#:443)
      HTTP/1.1 200 OK
      set-cookie: JSESSIONID=98D5####DE9E####1F35####FFBEC###; Path=/; Secure; HttpOnly; SameSite=Lax
      content-type: text/html;charset=UTF-8

Environment

VMware NSX-T Data Center 3.x
VMware NSX 4.x
VCF NSX 9.x

Cause

For ESXi host preparation to succeed ports 1234, 1235 and 443 must be open from ESX host to NSX Manager.
Connectivity issues between Host and NSX Manager on TCP port 1234 will cause this issue

Resolution

Open all required ports for this NSX version between the ESX and the NSX Manager, typically 1234, 1235 and 443, see Required Ports.

Retrigger the install:

1. Resave the host transport node configuration by doing an Edit on this host
2. Once saved, it will push the current configuration on to the host.
3. If the status is still 'Installed Failed' state, then click on the individual host's 'Installed Failed' button and clicked on 'Resolve'
4. After which the the Host should come back with Success state

Additional Information