For Grindcore users 3.0, 3.0.1 during NSX upgrade, the issue is seen during ESX Host Upgrade stage with the error: [LiveInstallationError] Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade']
search cancel

For Grindcore users 3.0, 3.0.1 during NSX upgrade, the issue is seen during ESX Host Upgrade stage with the error: [LiveInstallationError] Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade']

book

Article ID: 330530

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
While performing NSX upgrade on Transport Nodes of Grindcore 3.0 and Grindcore 3.0.1 release, we hit the following issue: [LiveInstallationError] Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade'].

log snippet:

Failed to install software on host. Failed to install software on host. spwyvp2007.spwynet.com : java.rmi.RemoteException: [LiveInstallationError] Error in running ['/etc/init.d/nsx-datapath-dl', 'start', 'upgrade']: Return code: 1 Output: start upgrade begin Exception: Traceback (most recent call last): File "/etc/init.d/nsx-datapath-dl", line 970, in <module> DualLoadUpgrade() File "/etc/init.d/nsx-datapath-dl", line 870, in DualLoadUpgrade vs.RTM_UpgradeOp(psName, fromBuildModIDList, toBuildModIDList) File "/lib64/python3.5/nsx/lib/libvswitch.py", line 7412, in RTM_UpgradeOp 'status: %d' % status) Exception: FAILED: RTM Command status: 195887107 It is not safe to continue. Please reboot the host immediately to discard the unfinished update. Please refer to the log file for more details

The issue is seen because of CVDS HostSwitches don't get hotswapped to cswitch module and are still shown to be managed by NSX when they're removed or no longer part of TransportNode config.
The issue is seen even though CVDS HostSwitch is currently not part of Transport Node config but TN previously had and was removed from it.


Environment

VMware NSX-T Data Center

Cause


Customers using NSX-T 3.0.1 and 3.0.2 release will likely face this issue if they have configured TransportNodes using CVDSes and are attempting to migrate to NSX-T-3.0.2+. 

Resolution

The fix for the issue has been made in NSX-T  3.0.2 release where CVDSs removed from TransportNode config are hotswapped to cswitch module by a scheduled task which runs over all CVDSs and hotswap them.

 *Relevant log’s location*: [Enter log location]
The relevant set of logs are in:
On Host - /var/log/upgrade-coordinator/upgrade-coordinator.log
Use the command on host to check if CVDSes exist which are NSX managed:- nsxdp-cli vswitch instance list
If found then follow the workaround of deleting TN so that all CVDS hostswitches will get hotswapped to cswitch module.
 *Steps to reproduce*: [Provide the steps to reproduce this problem]
The bug can be reproduced with configuring Transport Nodes with CVDS and attempting a migration from Grindcore-3.0/3.0.1 to higher release.

Workaround:
If the transport nodes only contain NVDS hostswitches (and CVDS were removed from it) then delete Transport Nodes and recreate them with NVDS only hostswitches and then attempt for upgrade.

If the transport nodes still contain CVDS hostswitches, then remove CVDS hostswitches and delete Transport Node that way CVDS will be hotswapped to cswitch. Then attempt for upgrade. Once upgrade is successful, configure CVDSs back to Transport Node config.

Additional Information

Impact/Risks:
TransportNodes/ESX hosts will result in error state and a potential loss of connectivity for VMs.

Attachments

nsxcfg-vswitch get_app