HA Failover Between NSX Edges Does Not Occur When BGP Over IPSec VTI Tunnel Goes Idle
search cancel

HA Failover Between NSX Edges Does Not Occur When BGP Over IPSec VTI Tunnel Goes Idle

book

Article ID: 419064

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • When running BGP over IPSec tunnels using VTI interfaces (without true NSX uplinks), customers may observe that HA failover between NSX Edge nodes does not occur if the BGP neighbor on the active Edge goes down or becomes idle. 

  • Connectivity disruptions occur when running BGP over an IPSec tunnel if the tunnel goes down. Despite BGP being configured on the tunnel, traffic does not fail over to the standby Edge.

  • FRR logs (NSX Edge log at /var/log/frr/frr.log) display Next Hop Tracking (NHT) messages similar to:
    BGP: %ADJCHANGE: neighbor <IP>(Unknown) in vrf default Down Waiting for NHT

  • The “Status” column in the BGP neighbor table under the Tier-0 Gateway will reflect the current state of the BGP session (e.g., Success, Idle, Established).
  • The BGP neighbor summary also shows the neighbor in IDLE state rather than Established ("Estab").
    *Refer to Troubleshooting NSX BGP for commands used to check BGP sessions

Cause

While BGP neighbor state can contribute to HA failover, it only does so when all BGP peers across all NSX uplinks on the Edge node are lost.

VTI interfaces used with IPSec VPN tunnels are handled differently than "NSX uplinks". These interfaces are excluded from the HA failover decision process, and their BGP status is not evaluated during failover conditions.

If BGP is configured solely over these VPN‑based interfaces, changes in session state such as a neighbor becoming idle due to tunnel failure will not initiate HA failover.

Resolution

This is expected behavior. Refer to Troubleshooting NSX Edge High Availability, which explains the conditions under which NSX Edge HA failover is triggered.

Additional Information

Customers who require failover in these scenarios can implement a workaround by configurating static routing with Bidirectional Forwarding Detection (BFD) enabled. Refer to instructions at Configure NSX BFD 

  • BFD provides rapid detection of forwarding path failures between Edge nodes and their BGP peers

  • When a failure is detected, BFD can trigger a routing change that allows traffic to fail over to the standby Edge, maintaining connectivity and minimizing disruption.