"Failed to install software on host" error during upgrade NSX-T from 2.4.x to 2.5.0
search cancel

"Failed to install software on host" error during upgrade NSX-T from 2.4.x to 2.5.0

book

Article ID: 344804

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

This article provides information on cleaning up NSX-T Data Center 2.4.x and successfully upgrade failed hosts to 2.5.0.

Symptoms:
  • Upgrading NSX-T Data Center 2.4.x to 2.5.0 fails.
  • VCF upgrades are also impacted here 
  • In the NSX Manager User Interface (UI) System > Fabric > Nodes, the failed hosts displays the message “Host Disconnected”.
  • Clicking the Resolve button results into a host running into a "NSX install failed" status and shows the error similar to:

    Failed to install software on host. Failed to install software on host. xxx.xxx.xxx.xxx : java.rmi.RemoteException: [LockingError] Another process is updating the ESX image. Please try again later. Please refer to the log file for more details.
     
  • Rebooting the host fixes the issue, but runs into another issue after clicking on the Resolve button with the error similar to:

    Failed to install software on host. Failed to install software on host. xxx.xxx.xxx.xxx : java.rmi.RemoteException: [LiveInstallationError] Error in running ['/etc/init.d/nsx-datapath', 'start', 'upgrade']: Return code: 1 Output: start upgrade begin Exception: Traceback (most recent call last): File "/etc/init.d/nsx-datapath", line 303, in backupUplinkMirrorSessions backupDict[BACKUP_UPLINK_MIRROR] =
     
  • Re-triggering or rebooting the host does not resolve the issue.


Environment

VMware NSX-T Data Center 2.x
VMware NSX-T Data Center

Cause

The first error occurs because of a module unloading failure.
The second error occurs because of an up overlay-host-switch cleanup failure.

Resolution

This is a known issue affecting VMware NSX-T Datacenter 2.4.x.

The issue is fixed in the version - NSX 2.5.1 and VCF 3.10

Workaround:
To work around this issue:
  1. Reboot the failed upgraded hosts to fix module unloading error.
  2. Run this script to remove NSX-T 2.4.x:

    #!/bin/sh
    set -ex
    # Remove uplinks
    esxcfg-vswitch -Q vmnic1 -V uplink-1 "DvsPortset-0"
    # Remove vmks
    DVPortID=$(nsxdp-cli vswitch instance list | grep vmk50 | awk '{print $3}')
    esxcfg-vmknic --netstack=vxlan -d -v 10 -s DvsPortset-0
    esxcfg-vmknic --netstack=hyperbus -d -v $DVPortID -s "DvsPortset-0"
    # Remove VDR port
    vmkload_mod -u nsxt-vdrb
    # Remove Opaque switch
    vsish -e set /net/portsets/DvsPortset-0/destroy destroy
    # Remove NSX-T 2.4.x vibs
    esxcli software vib remove -f -n nsx-adf -n nsx-aggservice -n nsx-cli-libs -n nsx-common-libs -n nsx-esx-datapath  -n nsx-exporter -n nsx-host  -n nsx-metrics-libs  -n nsx-mpa  -n nsx-nestdb-libs -n nsx-nestdb  -n nsx-netcpa  -n nsx-opsagent -n nsx-platform-client -n nsx-profiling-libs  -n nsx-proxy -n nsx-python-gevent -n nsx-python-greenlet -n nsx-python-logging  -n nsx-python-protobuf -n nsx-rpc-libs -n nsx-sfhc  -n nsx-upm-libs -n nsx-vdpi

     
  3. Go back to NSX Manager > System > Fabric > Nodes and resolve the failed hosts. This installs the NSX-T 2.5.0 VIBs on them.