Traffic flow goes down/black-holed after powering off one of the active NSX-T Edge Nodes in a Federated environment
search cancel

Traffic flow goes down/black-holed after powering off one of the active NSX-T Edge Nodes in a Federated environment

book

Article ID: 312621

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

In an NSX-T Federated environment, an NSX-T Edge Node has been powered off either ungracefully through the vCenter UI or due to a crash.

Consequently, stretched NSX segment traffic is routed to the offline edge due to the host showing the vTEP state as active instead of removing it. As a result, traffic disruption occurs on the NSX-T Edge Nodes.

To troubleshoot this issue, execute the following command  "get logical-switch [logical switch UUID] vtep-group" on the NSX-T Edge Node as an admin user after powering it off to observe a similar output below.

Wed May 10 2023 UTC 19:12:37.973
VTEP Group Label: 45057
Type: Gateway
HA Type: Active/Standby
Activeness Proto: Activeness Notification
HA State Sync (ms): 32312
Active Mbr: 1
      Label: 46081
      VTEP IP: 192.168.1.151
      VTEP MAC: 0a:00:08:2f:55:a6
      State: 1 <==== Active
      BFD Count: 0
      Label: 109569
      VTEP IP: 192.168.1.152
      VTEP MAC: 0a:00:08:37:10:ae
      State: 1 <==== Active
      BFD Count: 0
 
Note: The above command only runs in a Federated deployment.



Environment

VMware NSX-T Data Center
VMware NSX

Cause

The Central Control Plane (CCP) aggregates all the VTEP IPs from VtepGroupStateMsgs reported by NSX-T Edge Node and publishes VtepGroupMsgs without modification. However, if an active NSX-T Edge Node crashes or was shut down ungracefully, then the NSX-T Edge Node cannot report its latest status. The CCP will still consider the VTEP IPs on that NSX-T Edge Node as active, and will continue to publish them. This will result in traffic disruption.


Resolution

This issue is resolved in NSX 4.1.2 or higher. 

Workaround:
Powering on the NSX-T Edge Node that was ungracefully powered off will resume normal traffic flow.