Traffic flow goes down/black-holed after powering off one of the active NSX-T Edge Nodes in a Federated environment
search cancel

Traffic flow goes down/black-holed after powering off one of the active NSX-T Edge Nodes in a Federated environment

book

Article ID: 312621

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • You are running NSX-T version 3.1.x, 3.2.x or 4.x.
  • You are running an NSX-T Federated environment.
  • You have recently powered off an NSX-T Edge Node using vCenter UI (powered down ungracefully) or the NSX-T Edge Node may have crashed.
  • Stretched NSX segment traffic is routed to the offline edge due to the host showing the vTEP state as active instead of removing it.
  • You are experiencing traffic disruption on the NSX-T Edge Nodes.
  • If you run the command on the NSX-T Edge Node get logical-switch [logical switch UUID] vtep-group as admin user after powering off The NSX-T Edge Node you will observe a similar output.
Wed May 10 2023 UTC 19:12:37.973
VTEP Group Label: 45057
Type: Gateway
HA Type: Active/Standby
Activeness Proto: Activeness Notification
HA State Sync (ms): 32312
Active Mbr: 1
      Label: 46081
      VTEP IP: 192.168.1.151
      VTEP MAC: 0a:00:08:2f:55:a6
      State: 1 <==== Active
      BFD Count: 0
      Label: 109569
      VTEP IP: 192.168.1.152
      VTEP MAC: 0a:00:08:37:10:ae
      State: 1 <==== Active
      BFD Count: 0

 
Note: You will only be able to run the above command successfully if your NSX-T environment is Federated.


Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x

Cause

The Central Control Plane (CCP) aggregates all the VTEP IPs from VtepGroupStateMsgs reported by NSX-T Edge Node and publishes VtepGroupMsgs without modification. However, if an active NSX-T Edge Node crashes or was shut down ungracefully, then the NSX-T Edge Node cannot report its latest status. The CCP will still consider the VTEP IPs on that NSX-T Edge Node as active, and will continue to publish them. This will result in traffic disruption.


Resolution

This is a known issue impacting NSX-T Data Center.

Workaround:
Powering on the NSX-T Edge Node that was ungracefully powered off will resume normal traffic flow.