NSX-T host upgrade is failing on hosts with error "File "/etc/init.d/nsx-datapath-dl", <module> DualLoadUpgrade()"
In a VCF environment, this upgrade is done through VCF calling NSX api
In var/log/esxupdate.log you see similar errors to below
2021-09-24T14:03:52Z esxupdate: 2117331: LiveImageInstaller: DEBUG: output: Error: More than one exception specification for tardisk /tardisks/nsx_neto.v00 Error: Ignoring /etc/vmware/secpolicy/tardisks/netopa 2021-09-24T14:04:03Z esxupdate: 2117331: LiveImageInstaller: DEBUG: output: start upgrade begin Exception: Traceback (most recent call last): File "/etc/init.d/nsx-datapath-dl", line 970, in <module> DualLoadUpgrade() File "/etc/init.d/nsx-datapath-dl", line 870, in DualLoadUpgrade vs.RTM_UpgradeOp(psName, fromBuildModIDList, toBuildModIDList) File "/lib64/python3.5/nsx/lib/libvswitch.py", line 7412, in RTM_UpgradeOp 'status: %d' % status) Exception: FAILED: RTM Command status: 195887107 2021-09-24T14:04:03Z esxupdate: 2117331: LiveImageInstaller: WARNING: Handling Live Vib Failure: Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade']: Return code: 1 Output: start upgrade begin Exception: Traceback (most recent call last): File "/etc/init.d/nsx-datapath-dl", line 970, in <module> DualLoadUpgrade() File "/etc/init.d/nsx-datapath-dl", line 870, in DualLoadUpgrade vs.RTM_UpgradeOp(psName, fromBuildModIDList, toBuildModIDList) File "/lib64/python3.5/nsx/lib/libvswitch.py", line 7412, in RTM_UpgradeOp 'status: %d' % status) Exception: FAILED: RTM Command status: 195887107 It is not safe to continue. Please reboot the host immediately to discard the unfinished update. 2021-09-24T14:04:03Z esxupdate: 2117331: LiveImageInstaller: DEBUG: Running [['/etc/init.d/nsx-pre-cfgagent', 'stop', 'upgrade']]... 2021-09-24T14:04:26Z esxupdate: 2117331: LiveImageInstaller: WARNING: Error in running ['/etc/init.d/nsx-pre-cfgagent', 'start', 'upgrade']: Return code: 1 Output: /tmp/nsx2/.nsx_dp_upgrade_in_progress exists, returning It is not safe to continue. Please reboot the host immediately to discard the unfinished update. 2021-09-24T14:04:26Z esxupdate: 2117331: LiveImageInstaller: WARNING: Error in running ['/etc/init.d/nsx-pre-idps', 'start', 'upgrade']: Return code: 1 Output: /tmp/nsx2/.nsx_dp_upgrade_in_progress exists, returning It is not safe to continue. Please reboot the host immediately to discard the unfinished update. 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Traceback (most recent call last): 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/build/mts/release/bora-15843807/bora/build/esx/release/vmvisor/esxupdate/lib64/python3.5/site-packages/vmware/esximage/Installer/LiveImageInstaller.py", line 555, in Remediate 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/build/mts/release/bora-15843807/bora/build/esx/release/vmvisor/esxupdate/lib64/python3.5/site-packages/vmware/esximage/Installer/LiveImageInstaller.py", line 1034, in _StartServices 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/build/mts/release/bora-15843807/bora/build/esx/release/vmvisor/esxupdate/lib64/python3.5/site-packages/vmware/esximage/Installer/LiveImageInstaller.py", line 1543, in RunCmdWithMsg 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: vmware.esximage.Errors.InstallationError: Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade']: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Return code: 1 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Output: start upgrade begin 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Exception: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Traceback (most recent call last): 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/etc/init.d/nsx-datapath-dl", line 970, in <module> 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: DualLoadUpgrade() 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/etc/init.d/nsx-datapath-dl", line 870, in DualLoadUpgrade 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: vs.RTM_UpgradeOp(psName, fromBuildModIDList, toBuildModIDList) 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/lib64/python3.5/nsx/lib/libvswitch.py", line 7412, in RTM_UpgradeOp 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: 'status: %d' % status) 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Exception: FAILED: RTM Command status: 195887107 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: It is not safe to continue. Please reboot the host immediately to discard the unfinished update. 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: During handling of the above exception, another exception occurred: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: Traceback (most recent call last): 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 790, in <module> 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: main() 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 781, in main 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: ret = CMDTABLE[command](options) 2021-09-24T14:04:58Z esxupdate: 2117331: root: ERROR: File "/usr/lib/vmware/esxcli-software", line 612, in VibInstallCmd
In var/log/vmkernel.log we notice similar message to below
2021-09-24T14:04:02.720Z cpu9:2118456)WARNING: nsx_vdrb: VdrRtmConnectVdrPort:317: [nsx@6876 comp="nsx-esx" subcomp="vdrb-16887201"]SYS:VDR connection not created for DVS 50 37 78 bc 40 9a fd de-3a e5 87 0f 98 93 0f 0c status: Not found 2021-09-24T14:04:02.720Z cpu9:2118456)WARNING: nsx_vdrb: VdrRtmPSRuntimeRestore:1598: [nsx@6876 comp="nsx-esx" subcomp="vdrb-16887201" errorCode="ESX3"]CP:Failed to connect to the VDR port, ps DvsPortset-0 status: Not found 2021-09-24T14:04:02.720Z cpu9:2118456)RTM_PsRuntimeRestore:1601:[nsx@6876 comp="nsx-esx" subcomp="rtm" errorCode="ESX3"]PS runtime restore CB failed for client vdrb ps DvsPortset-0 : Not found 2021-09-24T14:04:02.720Z cpu9:2118456)nsx_vdrb: VdrRtmPSRuntimeComplete:1651: [nsx@6876 comp="nsx-esx" subcomp="vdrb-15945993"]CP:VDR RTM PS Runtime complete, status = 2 2021-09-24T14:04:02.720Z cpu9:2118456)RTM_PsRuntimeComplete:1414:[nsx@6876 comp="nsx-esx" subcomp="rtm"]PS complete CB called for client vdrb ps DvsPortset-0 2021-09-24T14:04:02.720Z cpu9:2118456)vdl2: VDL2RtmPSRuntimeComplete:2352: [nsx@6876 comp="nsx-esx" subcomp="vdl2-15945993"]RTM PS Runtime complete CB, status = 2 2021-09-24T14:04:02.720Z cpu9:2118456)RTM_PsRuntimeComplete:1414:[nsx@6876 comp="nsx-esx" subcomp="rtm"]PS complete CB called for client vdl2 ps DvsPortset-0 2021-09-24T14:04:02.720Z cpu9:2118456)CharDevRTMPsUpgrade:617:[nsx@6876 comp="nsx-esx" subcomp="rtm" errorCode="ESX3"]PortSet runtime restore failed of ps DvsPortset-0 : Not found
Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment.
Environment
VMware vSphere ESXi 7.0.1 VMware NSX-T Data Center VMware vSphere ESXi 7.0.0 VMware vSphere ESXi 7.0.2
Cause
During VC and ESXi 24 hours dvport sync workflow, the dvports owned by NSX such as VTEP ports, hyperbus ports, VDR port and SPF port are reported to VC.In some cases the SPF port is unexpectedly removed.
Workaround: NOTE: The steps below should be performed only on impacted ESXi hosts. To work around this issue manually install the VIB using no-live method.
You can download the NSX-T Data Center VIBs manually and make them part of the host image. The download paths can change for each release of NSX-T Data Center. Always check the NSX-T Data Center downloads page to get the appropriate VIBs.
Log in to the host as root or as a user with administrative privileges
Navigate to the /tmp directory.
[root@host:~]: cd /tmp
Download and copy the nsx-lcp file into the /tmp directory.