VMware Aria Operations (vROPs) alarm "Logical Switch State has failed" observed for NSX segments

Products

VMware NSX

Issue/Introduction

VMware Aria Operations (vROPs) reports a critical NSX alarm stating that a "Logical Switch State has failed".
On the NSX UI, Network -> Segments, the segment has a 'Status' of 'Success'.
On Manager view, Networking -> Logical Switches, the logical switch is in a 'Failed' state.
- Enable Manager view if not enabled: System -> General Settings -> User Interface Mode Toggle -> Edit and change "Toggle Visibility" to allow either Admin or All Users.

The following API call for the segmeent reporting as failed on vROPs shows a state as 'failed':
GET /policy/api/v1/infra/segments/<segment ID>/state

GET /policy/api/v1/infra/segments/test-segment/state
{
    "segment_path": "/infra/segments/test-segment",
    "state": "failed",
    "details": []
}

Environment

VMware NSX-T Data Center

VMware NSX

Cause

The failed state occurs when a vDS referenced by NSX no longer exists in the vCenter inventory. The failed state has no functional dataplane impact.

Resolution

This issue is resolved in VMware NSX 9.0, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround

Note: The following steps require care and precision to execute correctly. If in doubt, please open a support case, and a technical support engineer can implement them.

Step 1: Confirm which vDS exist in vCenter and their UUID

On the vSphere client Inventory, change to Network view.
Select the vDS and in the url bar the moid is identified. In this example, Test-Switch has a moid of dvs-2040052:

3. Open a new browser tab and use the url https://VCENTER_FQDN_or_IP/mob/?moid=<value identified above i.e. dvs-2040052>:
The UUID of the vDS is 50 1d 74 40 ## ## ## b1-bd ## ## ## 41 fc 21 6f

4. Compile a list of UUIDS for all vDS. Note if there are multiple Compute Managers configured, check all vCenters.

Step 2: Remediation

Take an NSX FTP backup and ensure the passphrase is known and a backup of vCenter.
Copy the attached script, delete_lsstate_stale_vds.py, to one NSX Manager

As root on the NSX Manager CLI, run the script to detect the stale vds:

python3 ./delete_lsstate_stale_vds.py --detect_stale_vds

#Stale VDS ID
-----------------------------------------------
50 03 59 ## ## ## ## ## ## ## ## ## ## ## 84 e9
50 25 5f ## ## ## ## ## ## ## ## ## ## ## 5e 40
50 25 db ## ## ## ## ## ## ## ## ## ## ## e0 cc
50 03 76 ## ## ## ## ## ## ## ## ## ## ## d9 bd
50 03 0b ## ## ## ## ## ## ## ## ## ## ## 81 db

Run the script as read-only with one of the listed stale vds IDs in the previous step:
python3 ./delete_lsstate_stale_vds.py --vds_id "<VDS ID>" --read_onlypython3 ./delete_lsstate_stale_vds.py --vds_id "50 03 59 ## ## ## ## ## ## ## ## ## ## ## 84 e9" --read_only
Once performed in read-only mode and confirmed that it completed successfully, proceed with the actual cleanup.
python3 ./delete_lsstate_stale_vds.py --vds_id "<Stale vDS UUID>" --cleanuppython3 ./delete_lsstate_stale_vds.py --vds_id "50 03 59 ## ## ## ## ## ## ## ## ## ## ## 84 e9" --cleanup
Run the following command on all 3 managers from the root shell and allow about 10 minutes before validating the change has propagated to the UI
start search resync all

Note: If a valid vDS is deleted from the NSX DB in error, the vDS will remain in vCenter. It can be reloaded to the NSX DB by restarting the inventory service on all 3 Manager as an admin CLI user:
restart service cm-inventory

Attachments

delete_lsstate_stale_vds.py get_app