During upgrade from NSX 3.1 to 3.2, VIB upgrade failed with below error.
"Unexpected error while upgrading upgrade unit: Install of offline bundle failed on host xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with error : [LiveInstallationError] VMware_bootbank_nsx-esx-datapath_3.1.3.7.4-7.0.19761813: Error in running [/etc/init.d/vdr-pre-upgrade stop upgrade]: Return code: 1 Output: stop upgrade begin Traceback (most recent call last): File "/etc/init.d/vdr-pre-upgrade", line 89, in <module> main() File "/etc/init.d/vdr-pre-upgrade", line 85, in main raise Exception("VDR Failed to perform full sync, %d" % status) Exception: VDR Failed to perform full sync, 0 It is not safe to continue. Please reboot the host immediately to discard the unfinished update. cause = ('nsx-lcp-bundle(3.1.3.7.4-7.0.19761813)', 'vdr-pre-upgrade', 'Error in running [/etc/init.d/vdr-pre-upgrade stop upgrade]:\nReturn code: 1\nOutput: stop upgrade begin\nTraceback (most recent call last):\n File "/etc/init.d/vdr-pre-upgrade", line 89, in <module>\n main()\n File "/etc/init.d/vdr-pre-upgrade", line 85, in main\n raise Exception("VDR Failed to perform full sync, %d" % status)\nException: VDR Failed to perform full sync, 0\n') vibs = ['VMware_bootbank_nsx-esx-datapath_3.1.3.7.4-7.0.19761813'] Please refer to the log file for more details."
VMware NSX-T
VMware NSX Data Center
Below log shows that nsxcli "get port" commands was executed during upgrade. It is not recommended to run any nsxcli command during upgrade else it will return a device busy error.
nsxcli.log
2024-05-01T10:44:41.943Z 14645544 cli INFO NSX CLI started (ESX) for user: root
2024-05-01T10:44:44.286Z 14645544 cli.audit INFO CMD: get port (duration: 0.025s) (command: get ports), Operation status: CMD_EXECUTED
2024-05-01T10:44:44.287Z 14645544 cli INFO NSX CLI stopped for user: root
2024-05-01T10:44:44.908Z 14645639 cli INFO NSX CLI started (ESX) for user: root
2024-05-01T10:44:47.190Z 14645639 cli.audit INFO CMD: get port (duration: 0.025s) (command: get ports), Operation status: CMD_EXECUTED
2024-05-01T10:44:47.191Z 14645639 cli INFO NSX CLI stopped for user: root
2024-05-01T10:44:48.271Z 14645786 cli INFO NSX CLI started (ESX) for user: root
2024-05-01T10:44:49.322Z 14645850 cli INFO NSX CLI started (ESX) for user: root
Because there were nsxcli "get port" commands that was executed during upgrade, nsxcli tardisk failed to be removed because it was busy. This failure requires the host to be rebooted before upgrade is retried. Here the host moves into an inconsistent state.
esxupdate.log
2024-05-01T11:54:58Z esxupdate: 44000081: LiveImageInstaller: WARNING: Handling Live Vib Failure: VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Failed to unmount tardisk nsxcli.v00 of VIB VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Error in running [rm /tardisks/nsxcli.v00]: Return code: 1 Output: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy
...
2024-05-01T11:55:50Z esxupdate: 44000081: root: ERROR: vmware.esximage.Errors.LiveInstallationError: VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Failed to unmount tardisk nsxcli.v00 of VIB VMware_bootbank_nsxcli_3.1.3.7.4-7.0.19761813: Error in running [rm /tardisks/nsxcli.v00]:
2024-05-01T11:55:50Z esxupdate: 44000081: root: ERROR: Return code: 1
2024-05-01T11:55:50Z esxupdate: 44000081: root: ERROR: Output: rm: can't remove '/tardisks/nsxcli.v00': Device or resource busy
2024-05-01T11:55:50Z esxupdate: 44000081: root: ERROR:
2024-05-01T11:55:50Z esxupdate: 44000081: root: ERROR: It is not safe to continue. Please reboot the host immediately to discard the unfinished update.
The failure in vdr-pre-upgrade is not a real issue. It is an expected behavior because a previous upgrade failure (due to execution of nsxcli command during upgrade) had rendered the host in an inconsistent state. When nsxcli vib remove failed, a reboot should be done before retrying upgrade. There was no reboot of the host performed after this and upgrade was retried. This caused the vdr-pre-upgrade to fail and caused the error.
Please make sure that no nsxcli session is active/nsxcli commands are not running during upgrade on the host(s).
Verify the Process ID (PID) for nsxcli using "ps" command :
ps | grep -i "nsxcli"
Kill the running "nsxcli" service using "kill" command :
kill <service_pid>
Then reboot the host and ensure again no nsxcli session is active nor executed before upgrade is attempted.