Logical Switches in failed state upon upgrade to VMware NSX 4.1.x
search cancel

Logical Switches in failed state upon upgrade to VMware NSX 4.1.x

book

Article ID: 371405

calendar_today

Updated On: 12-13-2024

Products

VMware NSX

Issue/Introduction

In VMware NSX 4.x post upgrade, the LogicalSwitchState (Config State) are in failed state seen in Manager view

  • No functional issues found with the LogicalSwitch when "Config State" is showing "Failed."
  • Newly created LogicalSwitches after upgrade have no issues.

Similar log entries may be found in proton/nsxapi.log

 

2024-06-03T17:15:35.160Z ERROR l2VcFullSyncScheduler1 NsxPortgroupExecuteVcUtils 86947 SWITCHING [nsx@6876 comp="nsx-manager" errorCode="MP9538" level="ERROR" subcomp="manager"] Could not find the HostSwitch [<UUID>] of type VDS

 

When using this command as root in NSX manager #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSb4.txt, similar entries maybe found for the LogicalSwitch:

"############-############": {
  "portGroup": {
    "cmId": "<CM-UUID>",
    "portGroupKey": "dvportgroup-<pgID>"
  },
  "state": "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE"
}

Notice the ID of this logical switch is similar to "##################gga1a2d3" instead of the normal one such as "## ## ## ## ## ## ## bb-cc dd ee ff gg a1 a2 d3".

Also, the state of the logical switch in question is "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE" instead of "PORT_GROUP_STATE_ENUM_SUCCESS".

 

Environment

VMware NSX 4.x

Cause

This is due to the stale entries in the vdsIdPortGroupStateMap of the logical switch config.

This issue is not present in 3.2.x and above if any LogicalSwitch is created in those releases.

LogicalSwitches created in older releases like 3.0.x/3.1.x or earlier is where the issue is seen.

Resolution

This issue is resolved in VMware NSX 4.2.1. Once the state is cleared in the current release, this problem will not be seen when upgrading to a higher release.

Workaround:

We have developed a script to fix this issue. 

  • For NSX 4.1.x, please download script: del_dvpg_4.1.x_3407553_0916_allLS.py
  • For NSX 4.0.x, please download script: del_dvpg_4.0.x_3407553_0801.py

 

SSH as root into the vCenter Server VM where the logical switches are present:

  1. #/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select ID, DVS_ID, DVPORTGROUP_KEY, LOGICALSWITCH_UUID from vpx_dvportgroup;" | grep dvportgroup- | awk '{print $5}' > dvpgport_info.txt

Upload the script and dvport_info.txt file to any one node of the NSX cluster and execute the following command as root user:

  1. Please make a backup of NSX managers before proceed with the following steps. 
  2. #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSb4.txt
  3. #python3 del_dvpg_4.1.x_3407553_0916_allLS.py dvpgport_info.txt
  4. #corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable > LSSafter.txt
  5. start search resync all

If the Config State still shows failed after the resync command and the customer is running on NSX 4.2.0

  • Edit the logical segment from Policy mode
  • Edit the description. You can enter text and then remove it to leave it blank if you desire
  • Save the configuration and refresh the view
  • Confirm the Logical Switch is now showing Config State: Success from the Manager mode view

For fixing individual Logical Switches in NSX 4.1.x, please download del_dvpg_4.1.x_3407553_0916.py and use the following syntax:

  • #python3 del_dvpg_4.1.x_3407553_0916.py dvpgport_info.txt <logical_switch_id>

If there are issues after running the script, please open a Broadcom Support Case attaching the dvpgport_info.txt, LSSb4.txt, and LSSafter.txt files with the reference to this KB article. 

Attachments

del_dvpg_4.1.x_3407553_0916.py get_app
del_dvpg_4.1.x_3407553_0916_allLS.py get_app
del_dvpg_4.0.x_3407553_0801.py get_app