Traffic flow goes down/black-holed after powering off one of the active NSX-T Edge Nodes in a Federated environment
book
Article ID: 312621
calendar_today
Updated On:
Products
VMware NSX Networking
Issue/Introduction
Symptoms:
You are running NSX-T version 3.1.x, 3.2.x or 4.x.
You are running an NSX-T Federated environment.
You have recently powered off an NSX-T Edge Node using vCenter UI (powered down ungracefully) or the NSX-T Edge Node may have crashed.
Stretched NSX segment traffic is routed to the offline edge due to the host showing the vTEP state as active instead of removing it.
You are experiencing traffic disruption on the NSX-T Edge Nodes.
If you run the command on the NSX-T Edge Node get logical-switch [logical switch UUID] vtep-group as admin user after powering off The NSX-T Edge Node you will observe a similar output.
Wed May 10 2023 UTC 19:12:37.973 VTEP Group Label: 45057 Type: Gateway HA Type: Active/Standby Activeness Proto: Activeness Notification HA State Sync (ms): 32312 Active Mbr: 1 Label: 46081 VTEP IP: 192.168.1.151 VTEP MAC: 0a:00:08:2f:55:a6 State: 1 <==== Active BFD Count: 0 Label: 109569 VTEP IP: 192.168.1.152 VTEP MAC: 0a:00:08:37:10:ae State: 1 <==== Active BFD Count: 0
Note: You will only be able to run the above command successfully if your NSX-T environment is Federated.
Environment
VMware NSX-T Data Center VMware NSX-T Data Center 4.x
Cause
The Central Control Plane (CCP) aggregates all the VTEP IPs from VtepGroupStateMsgs reported by NSX-T Edge Node and publishes VtepGroupMsgs without modification. However, if an active NSX-T Edge Node crashes or was shut down ungracefully, then the NSX-T Edge Node cannot report its latest status. The CCP will still consider the VTEP IPs on that NSX-T Edge Node as active, and will continue to publish them. This will result in traffic disruption.
Resolution
This is a known issue impacting NSX-T Data Center.
Workaround: Powering on the NSX-T Edge Node that was ungracefully powered off will resume normal traffic flow.