Uninstalling NSX-T from ESXi host failing with error "Failed to remove all host switches or logical switches"
search cancel

Uninstalling NSX-T from ESXi host failing with error "Failed to remove all host switches or logical switches"

book

Article ID: 322468

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • While attempting to install NSX-T on host, installation fails at 18% with the following errors:
"failed to install software on a host <hostname>:java.rmi.RemoteExcepion:

[Live installation error] Error in running ['/rtc/init.d/nsx-opsagent'. 'stop'. 'upgrade']: Return code : 1 Output OK to upgrade nsx-opsagent stop nsx-opsagent stop watchdog

Terminating watchdog process with process PID 2105211 sh: you need to specify whom to kill nsx-ops-agent service is stopping cp: can't stat"
  • The resolve option does not help, if you attempt to run 'del nsx' in the nsxcli of the ESXi host, results in the below errors:
Exception when deleting nsx from host: ' error code: 4 stdout: delete_nsx_instance_from_host.sh: INFO: NSX reset script called with argument fabric_node on nsx-esx delete_nsx_instance_from_host.sh: INFO: Run transport_node reset on ESX node % Failed to remove all host switches or logical switches delete_nsx_instance_from_host.sh: ERROR: Failed to reset nsxa app of nsx-opsagent. Please check ospagent logs for more details. , stderr: <date-time> ERROR: Failed to reset nsxa app of nsx-opsagent. Please check ospagent logs for more details."
  • The /var/run/log/esxupdate.log on the ESXi host shows vdl2 unload failed errors:
cpu48:4580298)Mod: 5098: Unloading module <vmk-module-uuid> ...

cpu48:4580298)vdl2: VDL2Cleanup:756: [nsx@6876 comp="nsx-esx" subcomp="<vmk-module-uuid>"]Starting cleanup

cpu48:4580298)ALERT: Mod: 5251: Failed to unload module <vmk-module-uuid>, since its consumed resource count is 1. Waiting...

cpu48:4580298)ALERT: Mod: 5280: Failed to unload module <vmk-module-uuid>, since its consumed resource count is
  • Host properties are set to true on the DVS, which can be seen by running net-dvs -l.
com.vmware.nsx.kcp.enable

com.vmware.nsx.spf.enabled

com.vmware.nsx.vdl2.enabled

com.vmware.net.portset.fc.enabled

com.vmware.net.portset.fc.mcast.enabled

 

Environment

VMware NSX-T Data Center

Cause

This occurs when the uninstall process is unable to remove the module when certain advance configurations are applied on the host switch.

Resolution

  • Confirm the following properties of the VDS presence:
# net-dvs -l | grep com.vmware.nsx.kcp
    • A positive output for the issue similar to:
      com.vmware.nsx.kcp.enable = true , propType = CONFIG
    • Also check for the following properties: 
      • com.vmware.nsx.spf
      • com.vmware.nsx.vdl2
      • com.vmware.net.portset.fc.enabled
      • com.vmware.net.portset.fc.mcast
    • A negative output is where is no result after running the above commands. 
  •  To find the DVS names:
    • # esxcfg-vswitch -l
  • Then we need to disable the module for each DVS in use, using the following syntax below, repeating for each of the above modules that were found to be enabled:
# net-dvs -u "<property>" -p hostPropList <switchName>
    • For example, for DVS named RegionA01-VDS7:
# net-dvs -u com.vmware.nsx.kcp.enable -p hostPropList RegionA01-VDS7
# net-dvs -u com.vmware.nsx.spf.enabled -p hostPropList RegionA01-VDS7
# net-dvs -u com.vmware.nsx.vdl2.enabled -p hostPropList RegionA01-VDS7
  • Re-check the properties from step one and confirm no outputs
  • Confirm that nsx-opsagent service is running:
    # /etc/init.d/nsx-opsagent status
  • Start nsx-opsagent service if it was found to be NOT running from above command:
    # /etc/init.d/nsx-opsagent start
  • Place the ESXi host in vSphere maintenance mode and on the ESXi nsxcli shell run:
# nsxcli> del nsx
  • Confirm the NSX-T VIBs have been removed:
esxcli software vib list | grep -i nsx
  • If the NSX VIBs still remained in the ESXi host, access the esxi host's mob page https://<esxi-ip>/mob/ and destroy the VDS or NVDS that has the above-mentioned properties. Then reboot the host and issue del nsx command again.