NSX Edge BGP session drops during failover in a TOvrf active-standby configuration
search cancel

NSX Edge BGP session drops during failover in a TOvrf active-standby configuration

book

Article ID: 436338

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In a TO active-standby configuration on a cluster of two NSX Edge nodes, you may observe that when one of the Edge nodes (either active or passive) is shut down, the remaining active Edge node loses its BGP session.
  • During the failover, the BGP peer on the remaining active Edge node remains reachable via ping.
  • Packet captures show that the TCP session over the BGP port establishes successfully and an "Open Sent" message is transmitted.
  • However, this is immediately followed by a RST packet from the Edge node, bringing the BGP peering down.
  • Reviewing the frr.log will reveal the following error message:

    Connection from #.#.#.# rejected due to admin shutdown
  • Edge nodes do not have any TEP tunnels up.

Environment

VMware NSX

Cause

The NSX datapath relies heavily on GENEVE TEP (Tunnel Endpoint) tunnels to synchronize state and forward overlay traffic. If an Edge node detects that it has zero active TEP tunnels, it assumes it has been isolated from the NSX fabric. To prevent split-brain scenarios or traffic blackholing, the Edge node triggers a failsafe that administratively shuts down its external interfaces, which forcefully drops the BGP peering.

Resolution

Please refer below KB for the resolution steps
Newly created NSX BGP Neighbors are down (errors: "Connection from #.#.#.# rejected due to admin shutdown")