NSX 4.x
In order for automated recovery tools to process any NSX logical switches changes, such as removing an old segment, a backend mapping for the vDS-to-Transport-Zone association defined is required.
In some corner cases, the required association between the vDS and the Transport Zone is missing from the transport nodes, and therefore stale entries cannot be automatically cleared.
In the event there are any stale entries detected, the logical switch as a whole object is considered to be in a FAILED state until these entries can be removed.
Workaround:
Note: The following steps require care and precision to execute correctly. If in doubt please open a support case and a technical support engineer can implement them.
vdsIdPortGroupStateMap value:
# corfu_tool_runner.py -n nsx -t LogicalSwitchState -o showTable | grep "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE" -B 30Payload:{ "managedResource": { "displayName": "########-####-####-####-##########f7" }, ... ... "logicalSwitchDisplayName": "NSX-SWITCH-VLAN100", "transportZoneId": { "uuid": { "left": "####################", "right": "####################" } }, "stateStatus": "CONFIG_STATUS_FAILED", "lastUpdated": "1742816816555", "switchType": "LOGICAL_SWITCH_TYPE_DEFAULT", "vdsIdPortGroupStateMap": { "50 04 c2 a6 ## ## ## ea-c8 ## ## ## be 1f 5b 07": { "portGroup": { "cmId": ########-####-####-####-##########5d", "portGroupKey": "dvportgroup-######" }, "state": "PORT_GROUP_STATE_ENUM_SUCCESS" }, "50 04 5a 18 ## ## ## 5c-8e ## ## ## 7b 35 32 56": { <==== HERE "state": "PORT_GROUP_STATE_ENUM_FAILEDFORDELETE"vdsIdPortGroupStateMap value where it had failed to delete previously.cleanup script may be run to clear the stale entry manually from the NSX database
delete_lsstate_stale_vds.py, to one NSX ManagervdsIdPortGroupStateMap.
python3 ./delete_lsstate_stale_vds.py --vds_id "50 04 5a 18 ## ## ## 5c-8e ## ## ## 7b 35 32 56" --read_onlypython3 ./delete_lsstate_stale_vds.py --vds_id "50 04 5a 18 ## ## ## 5c-8e ## ## ## 7b 35 32 56" --cleanup50 04 5a 18 ## ## ## 5c-8e ## ## ## 7b 35 32 56 vDS identified in the corfu_tool_runner.py command run in Step 1 being repaired/removed, or the Transport Nodes related to it. Example:
YYYY-MM-DD HH:MM:SS,sss [INFO] LogicalSwitch ########-####-####-####-##########f7 needs stale dvs cleanup, LSState {'managedResource': {'displayName': '########-####-####-####-##########f7'}, 'logicalSwitchRevison': 3, 'ccpRealizedRevison': -1, 'portGroupStateRevision': 3, 'opaqueNetworkOnComputeManagerStateRevision': -1, 'logicalSwitchDisplayName': 'NSX-SWITCH-VLAN100', 'transportZoneId': {'uuid': {'left': '####################', 'right': '####################'}}, 'stateStatus': 'CONFIG_STATUS_FAILED', 'lastUpdated': '1742816816555', 'switchType': 'LOGICAL_SWITCH_TYPE_DEFAULT', 'vdsIdPortGroupStateMap': {'50 04 c2 a6 ## ## ## ea-c8 ## ## ## be 1f 5b 07': {'portGroup': {'cmId': '########-####-####-####-##########5d', 'portGroupKey': 'dvportgroup-######'}, 'state': 'PORT_GROUP_STATE_ENUM_SUCCESS'}, '50 04 5a 18 ## ## ## 5c-8e ## ## ## 7b 35 32 56': {'state': 'PORT_GROUP_STATE_ENUM_FAILEDFORDELETE'}}}--read_only output matches the logical switches/stale entries identified in Step 1 above, proceed with running the --cleanup scriptcleanup script, the environment should correct itself via the next vcFullSync cycle, which takes place automatically and frequently, and should occur within about 5 minutes at most.