VMware NSX
This is an expected behavior per design.
When two NSX Edge Nodes are configured in Active/Standby, both Edges will establish and maintain peering and route updates with upstream BGP Peers. During the initial Fail Over, traffic may experience a brief (a few seconds or less) connectivity outage as traffic moves from the Preferred Edge to the Non-Preferred Edge.
During Fail Back, in order for traffic to properly move from the Non-Preferred Edge to the Preferred Edge, the Non-Preferred Edge will drop BGP peering for 30 seconds. This may incur another brief outage as traffic re-establishes connections via the Preferred Edge Node.
This can be confirmed via the NSX Edge CLI on the Non-Preferred Edge at the moment of Fail Back.
Non-Preferred (Standby) Edge is Up and BGP is established before Fail Over:
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Estab 02:10:43 NC 1418 1339 17 2192.###.###.254 64800 Estab 02:10:43 NC 1417 1334 17 2
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Estab 02:10:43 NC 7947 7861 16 1fd00:#:#:#::#:85fe 64800 Estab 02:10:43 NC 7944 7859 16 1
Thu Mar 27 2025 UTC 19:23:46.617
Confirm Non-Preferred Edge is not active:
edge02(tier0_sr[2])> get high-availability statusThu Mar 27 2025 UTC 19:24:23.779Service RouterUUID : 83cc####-####-####-####-#######11675state : Standby ←Current State of Edge is Standby, waiting to take over if necessary.type : TIER0mode : A/Sfailover mode : Preemptiverank : 1service count : 0service score : 0HA ports state UUID : dc72####-####-####-####-#######c3ae4 op_state : Down ←This Edge is in a Down State - Failover has NOT occurred. addresses : 169.###.###.2/24;fe80:#:#:#:#:5300/64Peer Routers SR UUID : fa47####-####-####-####-#######d56ac Node UUID : f0ae####-####-####-####-#######d4639 HA state : Active ←Preferred Edge is Up and online.
Fail Over has occurred, Non-Preferred Edge is now Active:
edge02(tier0_sr[2])> get high-availability statusThu Mar 27 2025 UTC 19:26:06.061Service RouterUUID : 83cc####-####-####-####-#######11675state : Active ←Current State of Edge is Active, it has taken over traffic.type : TIER0mode : A/Sfailover mode : Preemptiverank : 1service count : 0service score : 0HA ports state UUID : dc72####-####-####-####-#######c3ae4 op_state : Up ←This Edge is in an Up State - Failover has occurred. addresses : 169.###.###.2/24;fe80:#:#:#:#:5300/64Peer Routers SR UUID : fa47####-####-####-####-#######d56ac Node UUID : f0ae####-####-####-####-#######d4639 HA state : Unreachable ←Preferred Edge is unreachable, justifying failover actions.
Non-Preferred Edge has become active, and has received NO change in BGP Peering. Any outage would have been short-lived. Peering has been established for over two hours.
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Estab 02:13:24 NC 1421 1343 17 2192.###.###.254 64800 Estab 02:13:24 NC 1420 1338 17 2
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Estab 02:13:24 NC 7965 7878 16 1fd00:#:#:#::#:85fe 64800 Estab 02:13:24 NC 7962 7876 16 1
Thu Mar 27 2025 UTC 19:26:27.592
When the preferred Edge is returning to functionality, the Fail Back will break BGP peering for 30 seconds (as designed):
Before Fail Back:
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Estab 02:18:01 NC 1428 1349 17 2192.###.###.254 64800 Estab 02:18:01 NC 1427 1344 17 2
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Estab 02:18:01 NC 7994 7907 16 1fd00:#:#:#::#:85fe 64800 Estab 02:18:01 NC 7991 7905 16 1
Thu Mar 27 2025 UTC 19:31:05.075
Fail Back has begun. Non-Preferred Edge BGP Peering is down and will remain down for 30 seconds:
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Idle 00:00:00 NC 1428 1351 0 0192.###.###.254 64800 Idle 00:00:00 NC 1427 1346 0 0
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Idle 00:00:00 NC 7994 7909 0 0fd00:#:#:#::#:85fe 64800 Idle 00:00:00 NC 7991 7907 0 0
Thu Mar 27 2025 UTC 19:31:06.068
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Idle 00:00:29 NC 1428 1351 0 0192.###.###.254 64800 Idle 00:00:29 NC 1427 1346 0 0
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Idle 00:00:29 NC 7994 7909 0 0fd00:#:#:#::#:85fe 64800 Idle 00:00:29 NC 7991 7907 0 0
Thu Mar 27 2025 UTC 19:31:35.179
After this 30-second period, traffic has already failed back to the preferred Edge with minimal to no outage, and BGP is re-established on the Non-Preferred Edge Node.
edge02(tier0_sr[2])> get bgp neighbor summaryBFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv4UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
192.###.###.254 64800 Estab 00:00:02 NC 1430 1353 0 0192.###.###.254 64800 Estab 00:00:02 NC 1429 1348 0 0
BFD States: NC - Not configured, DC - Disconnected DW - Down, IN - Init, UP - UpBGP summary information for VRF default for address-family: ipv6UnicastRouter ID: 192.###.###.2 Local AS: 65000
Neighbor AS State Up/DownTime BFD InMsgs OutMsgs InPfx OutPfx
fd00:#:#:#::#:84fe 64800 Estab 00:00:01 NC 7996 7911 0 0fd00:#:#:#::#:85fe 64800 Estab 00:00:01 NC 7993 7909 0 0
Thu Mar 27 2025 UTC 19:31:37.072
Non-Preferred Edge has returned to original High Availability State:edge02(tier0_sr[2])> get high-availability statusThu Mar 27 2025 UTC 20:07:20.064Service RouterUUID : 83cc####-####-####-####-#######11675state : Standby ←Non-Preferred Edge has returned to Standby state, waiting to take over if necessary.type : TIER0mode : A/Sfailover mode : Preemptiverank : 1service count : 0service score : 0HA ports state UUID : dc72####-####-####-####-#######c3ae4 op_state : Down addresses : 169.###.###.2/24;fe80:#:#:#:#:5300/64Peer Routers SR UUID : fa47####-####-####-####-#######d56ac Node UUID : f0ae####-####-####-####-#######d4639 HA state : Active ←Preferred Edge is once again Up and online.