NSX install on ESXi Transport Node fails at 48% "Waiting for Connection to Managers" due to stale NVDS reference
search cancel

NSX install on ESXi Transport Node fails at 48% "Waiting for Connection to Managers" due to stale NVDS reference

book

Article ID: 319044

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Preparing ESXi Transport Node for NSX fails at 48%, step "Waiting for connection to Managers"

     

  • Filtering /var/log/syslog on NSX Manager for the TN UUID shows install progress to 48%, and then many "no record for heartbeat found" entries. Eventually installation fails and NSX Manager logs "Time out waiting for host to join NSX Manager.":
     
    2022-09-01T22:36:20.939Z IZA02092 NSX 4699 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updating the DeploymentProgress from: DeploymentProgress [ id=<UUID>, deploymentType=HOST_TN, operationType=INSTALL, progress=48, stateDescription=deployment.progress.fn.registering_host, removeNsxFlag=false] to DeploymentProgress [ id=<UUID>, deploymentType=HOST_TN, operationType=INSTALL, progress=48, stateDescription=deployment.progress.fn.wait_for_mp_mpa_conn, removeNsxFlag=false]
    2022-09-01T22:36:20.953Z IZA02092 NSX 4699 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getClientHeartbeatStatus: client <UUID>, no record for heartbeat found.
    2022-09-01T22:36:25.957Z IZA02092 NSX 4699 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getClientHeartbeatStatus: client <UUID>, no record for heartbeat found.
    2022-09-01T22:36:30.962Z IZA02092 NSX 4699 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getClientHeartbeatStatus: client <UUID>, no record for heartbeat found.
    .
    .
    .
    2022-09-01T22:40:11.142Z IZA02092 NSX 4699 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getClientHeartbeatStatus: client <UUID>, no record for heartbeat found.
    2022-09-01T22:40:16.146Z IZA02092 NSX 4699 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] getClientHeartbeatStatus: client <UUID>, no record for heartbeat found.
    2022-09-01T22:40:16.149Z IZA02092 NSX 4699 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP26050" level="ERROR" subcomp="manager"] Host prep failed for <UUID>.
    2022-09-01T22:40:16.180Z IZA02092 NSX 4699 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updating the deploymentProgressState for deploymentUnitInstance: DeploymentUnitInstance [ id=DeploymentUnitInstance/<UUID>, deploymentUnitId=DeploymentUnit/<UUID>, hostId=HostTransportNode/<UUID>, entityId=null, prevEntityId=null, runningVersion=null, deploymentProgressState=INSTALL_FAILED, deploymentGoalState=ENABLED, internalLastKnownOSVersion=7.0.3, agentId=null, errorId=26050, errorMessage=Failed to install software on host. Time out waiting for host to join NSX Manager.] to INSTALL_FAILED:Failed to install software on host. Time out waiting for host to join NSX Manager.
      
  • Heartbeats between the ESXi host and NSX Manager are not established because the host fails to join management plane:
     
    /var/log/nsx-syslog.log on host:
    2022-09-01T22:37:18.791Z nsx-sfhc[9787679]: NSX 9787679 - [nsx@6876 comp="nsx-esx" subcomp="nsxsfhc" tid="9787714" level="WARNING"] Command nsxcli -c "join management-plane <IP>  thumbprint <hash>   token **********  node-uuid <UUID>  " failed with return-code 4 (% Node registration failed: 'Failed to get management ip addresses' ).
    2022-09-01T22:37:18.791Z nsx-sfhc[9787679]: NSX 9787679 - [nsx@6876 comp="nsx-esx" subcomp="nsxsfhc" tid="9787714" level="INFO"] joinMP command was called. return value is 4
    2022-09-01T22:37:18.791Z nsx-sfhc[9787679]: NSX 9787679 - [nsx@6876 comp="nsx-esx" subcomp="nsxsfhc" tid="9787714" level="INFO"] Join MP command failed
     
  • /var/log/nsxcli.log on host shows error "Cannot find portset associated with the connection point":
     
    2022-09-01T22:37:18.713Z 9789807 cli.lib_esx.esx_utils ERROR Failed to get interface list: Errors:
    Cannot find portset associated with the connection point
     
    2022-09-01T22:37:18.721Z 9789807 cli.descriptors.cli_command_service WARNING Node registration failed: 'Failed to get management ip addresses'
    2022-09-01T22:37:18.725Z 9789807 cli.audit INFO CMD: join management-plane <IP> thumbprint <hash> token <token-obfuscated> node-uuid <UUID> (duration: 1.466s), Operation status: CMD_EXECUTED_WITH_ERROR_RESULT
     
  • ESXi host was once prepared for NSX with an NVDS host switch which has since been deleted

Cause

When registering with the NSX Management plane, the host runs command 'localcli network ip interface list'  to retrieve its own vmk information. This command will fail with error "Cannot find portset associated with the connection point" if a vmk contains a stale reference to an NVDS that no longer exists on the host.
 
Example:
[root@host:~] localcli network ip interface list
Errors:
Cannot find portset associated with the connection point 
[root@host:~]

Resolution

Identify and remove the stale NVDS references on any vmks.
 
For example, running esxcfg-vmknic -l on a host may show vmks not listed in the vCenter UI which are connected to NVDS portgroups.

  • On the ESXi host, run command   esxcfg-vmknic -l and compare the output with host networking information in vCenter
  • Note the vmk# (name) of any interface that does not match what is in vCenter and should show 'false' under the Enabled column.
  • Run command esxcli network ip interface remove --interface-name=<vmk#>  // using the name(s) of noted stale vmk's instead of vmk#
  • Reboot the host
  • In the NSX Manager UI, select the host and configure it for NSX again. Expect to still see 48% progress for several minutes.

-----

Another method to find NVDS references is to generate a host bundle and grep recursively for 'opaque' in the resulting /commands directory. Any "opaque-network-id" strings listed are of particular interest.
 

Once command 'localcli network ip interface list' returns output, Transport Node installation will be able to progress past this issue.

If unable to identify stale NVDS references on vmks, contact Broadcom Support and note this Article ID (319044) in the problem description.

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Manager log bundles 
  • The vCenter support bundle, including the log bundle (vm-support) from the ESXi host failing to configure
  • Text of any error messages seen in the NSX GUI or command lines pertinent to the investigation

Handling Log Bundles for offline review with Broadcom support