Logical Switches in failed state upon upgrade to VMware NSX 4.1.x
search cancel

Logical Switches in failed state upon upgrade to VMware NSX 4.1.x

book

Article ID: 371405

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

In VMware NSX 4.x post upgrade, the LogicalSwitchState (Config State) are in failed state seen in Manager view

  • No functional issues found with the LogicalSwitch when "Config State" is showing "Failed."
  • Newly created LogicalSwitches after upgrade have no issues.

Similar log entries may be found in proton/nsxapi.log:

2024-06-03T17:15:35.160Z ERROR l2VcFullSyncScheduler1 NsxPortgroupExecuteVcUtils 86947 SWITCHING [nsx@6876 comp="nsx-manager" errorCode="MP9538" level="ERROR" subcomp="manager"] Could not find the HostSwitch [<UUID>] of type VDS

 

When using this command as root in NSX manager #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSb4.txt, similar entries maybe found for the LogicalSwitch:

"############-############": {
  "portGroup": {
    "cmId": "<CM-UUID>",
    "portGroupKey": "dvportgroup-<pgID>"
  },
  "state": "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE"
}

Notice the ID of this logical switch is similar to "##################gga1a2d3" instead of the normal one such as "## ## ## ## ## ## ## bb-cc dd ee ff gg a1 a2 d3".

Also, the state of the logical switch in question is "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE" instead of "PORT_GROUP_STATE_ENUM_SUCCESS".

Environment

VMware NSX 4.x

Cause

  • This is due to the stale entries in the vdsIdPortGroupStateMap of the logical switch config.
  • This issue is not present in 3.2.x and above if any Logical Switch is created in those releases.
  • Logical Switches created in older releases like 3.0.x/3.1.x or earlier is where the issue is seen.

Resolution

This is a known issue impacting VMware NSX.
The following workaround is non-impacting; however, it is always recommended to perform fixes within a scheduled maintenance window.

 

Workaround:

Once the state is cleared in the current release, this problem will not be seen when upgrading to a higher release.
We have developed a script to fix this issue. 

  • For NSX 4.2.x and 4.1.x, please download script: del_dvpg_4.1.x_3407553_0916_allLS.py
  • For NSX 4.0.x, please download script: del_dvpg_4.0.x_3407553_0801.py

 

SSH as root into the vCenter Server VM where the logical switches are present:

  1. #/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select ID, DVS_ID, DVPORTGROUP_KEY, LOGICALSWITCH_UUID from vpx_dvportgroup;" | grep dvportgroup- | awk '{print $5}' > dvpgport_info.txt

Upload the script and dvport_info.txt file to any one node of the NSX managers' /tmp directory, and execute the following command as root user:

  1. Please make a backup of NSX managers before proceed with the following steps. 
  2. #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSb4.txt
  3. Change directory to the same location as the python scipt (ie. >cd /tmp)
  4. #python3 del_dvpg_4.1.x_3407553_0916_allLS.py dvpgport_info.txt
  5. #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSafter.txt
  6. Run start search resync all on all the manager nodes in admin mode.

If the Config State still shows failed after the resync command and the customer is running on NSX 4.2.0

  • Edit the logical segment from Policy mode
  • Edit the description. You can enter text and then remove it to leave it blank if you desire
  • Save the configuration and refresh the view
  • Confirm the Logical Switch is now showing Config State: Success from the Manager mode view

For fixing individual Logical Switches in NSX 4.2.x and 4.1.x, please download del_dvpg_4.1.x_3407553_0916.py and use the following syntax:

  • #python3 del_dvpg_4.1.x_3407553_0916.py dvpgport_info.txt <logical_switch_id>

If there are issues after running the script, please open a Broadcom Support Case attaching the dvpgport_info.txt, LSSb4.txt, and LSSafter.txt files with the reference to this KB article. 

Additional Information

Same symptoms can also be seen if hitting another known issue as explained in KB: Logical Switch Status may incorrectly show as FAILED with no impact to realization

Attachments

del_dvpg_4.1.x_3407553_0916.py get_app
del_dvpg_4.1.x_3407553_0916_allLS.py get_app
del_dvpg_4.0.x_3407553_0801.py get_app