Tier-0 Gateway Displays "Down" Status due to the BFD Admin Status
search cancel

Tier-0 Gateway Displays "Down" Status due to the BFD Admin Status

book

Article ID: 433438

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The Tier-0 (T0) Gateway displays a "Down" operational status within the NSX Manager GUI.
  • Multiple alarms are triggered on the T0 Gateway.
  • North-South traffic routing is disrupted.
  • While navigating to Networking > Connectivity > Tier-0 Gateways, the following is seen :

  • While checking the Edge Cluster Health, the following is seen :

sdcvdinsxedg1> get edge-cluster status
High Availability State   : Active
          Since           : <timestamp>
Edge Node Id              : <UUID>
Edge Node Status          : Up (Routing Down)

Environment

VMware NSX

Cause

Bidirectional Forwarding Detection (BFD) was enabled (Admin Status set to "Up") for a Static Route Peer on the Tier-0 Gateway; however, the upstream physical router was not configured with a matching BFD configuration.

Since the BFD session failed to establish due to this upstream mismatch, the NSX Edge withdrew the static routes associated with that BFD peer. This route withdrawal cascaded into a Routing Down state for the Edge Cluster, ultimately causing the Tier-0 Gateway to display a "Down" status in the UI.

This was further confirmed by performing a packet capture on the associated Edge Node using the following command : start capture interface <interface_UUID> direction dual expression host 192.xx.xx.xx and port 3784 :

<timestamp> 00:50:56:xx:xx:xx > 00:22:bd:xx:xx:xx, ethertype IPv4 (0x0800), length 66: 192.xx.xx.xx.60280 > 192.xx.xx.xx.3784: BFDv1, Control, State Down, Flags: [none], length: 24
<timestamp> 00:50:56:xx:xx:xx > 00:22:bd:xx:xx:xx, ethertype IPv4 (0x0800), length 66: 192.xx.xx.xx.60280 > 192.xx.xx.xx.3784: BFDv1, Control, State Down, Flags: [none], length: 24
<timestamp> 00:50:56:xx:xx:xx > 00:22:bd:xx:xx:xx, ethertype IPv4 (0x0800), length 66: 192.xx.xx.xx.60280 > 192.xx.xx.xx.3784: BFDv1, Control, State Down, Flags: [none], length: 24

The above capture shows outgoing BFDv1 control packets continuously reporting State Down, indicating the session cannot be established with the upstream peer.

Resolution

To restore routing and bypass the BFD mismatch, the BFD requirement for the static route peer must be disabled. This forces NSX to install the static route regardless of the BFD session state.

If BFD is required for fast failure detection in your environment, coordinate with the network team to configure matching BFD parameters on the upstream physical router. Once the upstream configuration is verified, you can safely toggle the Admin Status back to "Up" in NSX.

Additional Information

In reference to the KB : Tier-0 Gateway reports status "DOWN" instead of "DEGRADED" during partial Edge Node failure