On pre-emptive failback, the new standby T0 router has BGP peering stuck in Active
book
Article ID: 336812
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms:
Tier-0 Logical Router configured in active-standby mode with pre-emptive mode enabled
After a failback event from the the non-preferred node to the preferred node, BGP peerings on the new standby node are stuck in Active state
During this time BGP commands e.g. get bgp neighbor summary return no output
The issue resolves itself after a 20 minute timeout period and BGP session returns to an Established state
This issue is not observed for T0 logical router failover, only failback
The issue is non impacting to the data path as the impacted T0 router is in standby mode
Edge log messages similar to this may be observed
<179>1 2019-11-05T15:09:23.614Z EDGE NSX 904 - [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="1449" level="ERROR" errorCode="MPAERR_MSR_QUERY_BGP_NEIGHBOR"] [UpdateFrrBgpNeighbor] Cannot get bgp-neighbor for lrouter:
Environment
VMware NSX-T Data Center VMware NSX-T Data Center 2.x
Cause
BGP and BFD processes connect with the routing platform to query BFD updates/status. When these queries come at the exact same time, the routing platform serves the BFD client only. This results in the BGP process getting hung until a watchdog timeout of 20 minutes restarts the process and resolves the issue.
Resolution
This is a known issue impacting VMware NSX-T Data Center 2.x
Workaround: To prevent this issue occurring pre-emptive mode can be disabled
Alternatively BGP sessions for the standby T0 will automatically recover after a 20 minute timeout period