ESXi host status is "Install Failed" in NSX after ESXi Upgrade
search cancel

ESXi host status is "Install Failed" in NSX after ESXi Upgrade

book

Article ID: 378075

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

  • ESXi host version was upgraded, i.e. from 7.0 to 8.0.
  • In NSX UI, the ESXi host reports NSX Configuration status as Install Failed.
  • Inspecting the issue by clicking "Install Failed" shows a lengthy message repeating the following for multiple VIBs: Node has invalid version <VIB version> of software <VIB name>
    For example:
    Node has invalid version 4.1.1.0.0-7.0.22224315 of software nsx-monitoring.
    [...]

     

NSX VIB version is shown in the following format: NSX.version.number-ESXi.version.number
In the above example:

    • VIB version is 4.1.1.0.0-7.0.22224315
    • NSX version is 4.1.1.0.0
    • ESXi version is 7.0.22224315
  • Datapath of NSX workloads may not be impacted right after the ESXi upgrade, i.e., VMs connected to a NSX managed segment still have network access.
  • VMs may not be able to migrate to other ESXi Transport nodes affected by this issue with related tasks in vCenter showing failure messages as below:
    The operation performed on <host name> in Datacenter timed out
    Unable to automatically migrate <vm name> from <host name>
  • An error may be seen when a running VM attempts to migrate to an 8.x ESXi host:
    "Currently connected network interface" 'Network adapter <adapter number>' uses network '<dvPortGroup-name> (<vDS-name>)', which is not accessible.
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

 

Cause

The newly upgraded ESXi hosts do not have correct NSX VIBs.
The Install status validation fails resulting in 'Install Failed' status.
The NSX VIBs are different for ESXi 7.0 and ESXi 8.0.

Resolution

In a VMware NSX setup, before an upgrade to ESXi 8.0 and later, make sure that the NSX kernel module is part of the image or upgrade baseline.
For any ESXi hosts that are not upgraded, follow this document: Upgrading ESXi Hosts in an Environment With VMware NSX

For ESXi hosts that were upgraded and showing errors:

  1. Attempt automatic resolution in NSX UI:
    1. Navigate to the list of ESXi Transport nodes:
      • NSX-T 3.2.2 and newer: Systems > Fabric > Hosts > Clusters
      • NSX-T 3.2.1 and prior: NSX UI to Systems > Fabric > Nodes > Host Transport Nodes
    2. Click on the "Failed" state of the ESXi host error message located in the ESXi Transport Nodes section.
    3. Click on the specific step that fails.
    4. Select the specific error message associated with the failed VIB installation.
    5. Click on the "RESOLVE" button.
      This will trigger a re-installation of NSX VIBs on the newly upgraded ESXi 8.0 hosts and soon after, the status of the ESXi hosts will show "up".
    6. Expect the process to take approximately 10 minutes if it works successfully, much of which will show it at 18% in the UI.
  2. If "RESOLVE" does not fix the Install Failed status:
    1. Check the status of the Compute Manager (if applicable):
    2. Navigate to System > Fabric > Compute Managers
    3. If the Connection Status is Down, remediate the condition:
      If you want to force a connection retry:
      1. Edit the Compute Manager.
      2. Click on Edit beside the FQDN.
      3. Re-enter the Credentials.
      4. Click on Save.
    4. Once the Compute Manager connection is restored, try to "RESOLVE" the installation failure on the ESXi host again (Section A).
  3. If "RESOLVE" still does not fix the Install Failed status:
    1. Place one affected ESXi host into Maintenance Mode.
    2. Migrate the ESXi host out of the cluster to the Datacenter level in vCenter.
    3. In the NSX UI, navigate to System > Fabric > Hosts > Other Nodes.
    4. If the ESXi host still shows Install Skipped or Install Failed, select the ESXi host and choose Remove NSX.
    5. If removal fails or the ESXi host shows as Orphaned, reselect the ESXi host, enable Force Remove, and retry.
    6. Wait until the ESXi host shows as Not Configured in the NSX UI.
    7. Move the ESXi host back into the cluster (keep it in Maintenance Mode).
    8. Monitor the NSX Manager UI for installation progress — the status should transition to SUCCESS.
    9. After confirming success on one ESXi host, repeat this process for the remaining affected ESXi hosts in the cluster.

Note: In some situations, the "RESOLVE" button might not be visible. If you do not see this button, try changing the zoom level in your web browser.

If you believe you have encountered this issue and are unable to implement the document workarounds, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.

Additional Information

Impact/Risks:
NSXA agent is marked as down and VMs cannot vMotion to the affected ESXi host.

Upgrade precheck failed with error:
Connectivity issue found between manager and transport node(s) XXXXXXXXX-XXXX-XXXX-XXXX-XXXX. The issue may affect the upgrade if left unresolved.

 

For additional confirmation as to why migrations to an ESXi host might fail with an error such as:

"Currently connected network interface" 'Network adapter #' uses network '####-## (##-#######-########)', which is not accessible.

Check in the output of the following command if nsxa is marked down: net-dvs -l

com.vmware.common.opaqueDvs.status.component.nsxa = down