During the upgrade from NSX 3.1 to 3.2, VIB upgrade failed with below error.
"Unexpected error while upgrading upgrade unit: Install of offline bundle failed on host xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with error : [LiveInstallationError] VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Failed to unmount tardisk nsxcli.v00 of VIB VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Error in running [rm /tardisks/nsxcli.v00]: Return code: 1 Output: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy It is not safe to continue. Please reboot the host immediately to discard the unfinished update. cause = ('nsx-lcp-bundle(3.1.3.7.4-7.0.19761813)', "Failed to unmount tardisk nsxcli.v00 of VIB VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Error in running [rm /tardisks/nsxcli.v00]:\nReturn code: 1\nOutput: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy\n") vibs = ['VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813'] Please refer to the log file for more details."
VMware NSX-T
VMware NSX Data Center
This issue is caused due to CRON job that runs nsxcli "get port" commands periodically and not able to close open CLI sessions.
"get port" commands was executing during upgrade. It is not recommended to run any CLI during upgrade else it will return a device busy error.
As per the workflow, ESXi kills all running nsxcli and then initiate removal of nsxcli VIB. Between kill of existing nsxcli and before removal of VIB, if there are new nsxcli calls done then upgrade will fail.
esxupdate.log :
esxupdate log shows nsxcli module unload failing.
2024-05-01T10:41:22Z esxupdate: 176019139: LiveImageInstaller: DEBUG: Trying to unmount payload [nsxcli] of VIB VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813
2024-05-01T10:41:22Z esxupdate: 176019139: LiveImageInstaller: DEBUG: Copying tardisk from /tardisks/nsxcli.v00 to /tmp/tardiskbackup/nsxcli.v00
2024-05-01T10:41:22Z esxupdate: 176019139: LiveImageInstaller: INFO: Unmounting nsxcli.v00...
2024-05-01T10:41:22Z esxupdate: 176019139: vmware.runcommand: INFO: runcommand called with: args = 'rm /tardisks/nsxcli.v00', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2024-05-01T10:41:22Z esxupdate: 176019139: LiveImageInstaller: INFO: Received error: Error in running [rm /tardisks/nsxcli.v00]: Return code: 1 Output: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy Trying again. Attempt #1
2024-05-01T10:44:47Z esxupdate: 14640959: root: ERROR: vmware.esximage.Installer.LiveImageInstaller.ExecuteCommandError: Error in running [rm /tardisks/nsxcli.v00]:
2024-05-01T10:44:47Z esxupdate: 14640959: root: ERROR: Return code: 1
2024-05-01T10:44:47Z esxupdate: 14640959: root: ERROR: Output: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy
vmkernel.log :
vmkernel.log shows nsxcli are used.
2024-05-01T10:41:23.697Z cpu0:176021131)VisorFSTar: 790: nsxcli.v00
2024-05-01T10:41:25.705Z cpu15:176021133)VisorFSTar: 790: nsxcli.v00
2024-05-01T10:41:25.706Z cpu15:176021133)VisorFSTar: 716: inode 7727901472775502476 (nsxcli) is busy
nsxcli.log :
nsxcli.log shows continuous "get port" calls.
2024-05-01T10:04:32.251Z 14567559 cli.audit INFO CMD: get port (duration: 0.020s) (command: get ports), Operation status: CMD_EXECUTED
2024-05-01T10:04:32.253Z 14567559 cli INFO NSX CLI stopped for user: root
2024-05-01T10:04:34.255Z 14567657 cli INFO NSX CLI started (ESX) for user: root
2024-05-01T10:04:36.442Z 14567657 cli.descriptors.cli_command_service INFO CMD: get port
Please make sure that no nsxcli session is active/nsxcli commands are not running during upgrade.
Verify the Process ID (PID) for nsxcli using "ps" command :
ps | grep -i "nsxcli"
Kill the running "nsxcli" service using "kill" command :
kill <service_pid>