In a Tier-0 router configured for Active/Standy High-Availability mode with Preemptive Fail Over, the standby Edge's Tier-0 loses BGP sessions after the primary Edge Node reboots or comes out of maintenance mode.
search cancel

In a Tier-0 router configured for Active/Standy High-Availability mode with Preemptive Fail Over, the standby Edge's Tier-0 loses BGP sessions after the primary Edge Node reboots or comes out of maintenance mode.

book

Article ID: 392361

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Tier-0 is configured in Active/Standby mode
  • Tier-0 Fail Over is configured for Preemptive
  • A Fail Over has occurred, moving traffic from preferred Edge Node to Non-Preferred Edge Node.
  • The Preferred/Primary Edge Node reboots or comes out of maintenance mode, a BGP outage is experienced on the Standby Edge Node.

Environment

VMware NSX

Resolution

This is expected behavior. 

When two NSX Edge Nodes are configured in Active/Standby, both Edges will establish and maintain peering and route updates with upstream BGP Peers. During the initial Fail Over, traffic may experience a brief (a few seconds or less) connectivity outage as traffic moves from the Preferred Edge to the Non-Preferred Edge.

During Fail Back, in order for traffic to properly move from the Non-Preferred Edge to the Preferred Edge, the Non-Preferred Edge will drop BGP peering for 30 seconds. This may incur another brief outage as traffic re-establishes connections via the Preferred Edge Node. 

This can be confirmed via the NSX Edge CLI on the Non-Preferred Edge at the moment of Fail Back.

 

Non-Preferred (Standby) Edge is Up and BGP is established before Fail Over:

edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800       Estab 02:10:43     NC  1418    1339    17     2
192.###.###.254                     64800       Estab 02:10:43     NC  1417    1334    17     2

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800       Estab 02:10:43     NC  7947    7861    16     1
fd00:#:#:#::#:85fe              64800       Estab 02:10:43     NC  7944    7859    16     1

Thu Mar 27 2025 UTC 19:23:46.617

 

Confirm Non-Preferred Edge is not active:

edge02(tier0_sr[2])> get high-availability status
Thu Mar 27 2025 UTC 19:24:23.779
Service Router
UUID                  : 83cc####-####-####-####-#######11675
state                 : Standby                                ←Current State of Edge is Standby, waiting to take over if necessary.
type                  : TIER0
mode                  : A/S
failover mode         : Preemptive
rank                  : 1
service count         : 0
service score         : 0
HA ports state
    UUID        : dc72####-####-####-####-#######c3ae4
    op_state    : Down                                         ←This Edge is in a Down State - Failover has NOT occurred.
    addresses   : 169.###.###.2/24;fe80:#:#:#:#:5300/64
Peer Routers
    SR UUID     : fa47####-####-####-####-#######d56ac
    Node UUID   : f0ae####-####-####-####-#######d4639
    HA state    : Active                                       ←Preferred Edge is Up and online.

 

Fail Over has occurred, Non-Preferred Edge is now Active:

edge02(tier0_sr[2])> get high-availability status
Thu Mar 27 2025 UTC 19:26:06.061
Service Router
UUID                  : 83cc####-####-####-####-#######11675
state                 : Active                                 ←Current State of Edge is Active, it has taken over traffic.
type                  : TIER0
mode                  : A/S
failover mode         : Preemptive
rank                  : 1
service count         : 0
service score         : 0
HA ports state
    UUID        : dc72####-####-####-####-#######c3ae4
    op_state    : Up                                           ←This Edge is in an Up State - Failover has occurred.
    addresses   : 169.###.###.2/24;fe80:#:#:#:#:5300/64
Peer Routers
    SR UUID     : fa47####-####-####-####-#######d56ac
    Node UUID   : f0ae####-####-####-####-#######d4639
    HA state    : Unreachable                                  ←Preferred Edge is unreachable, justifying failover actions.

 

Non-Preferred Edge has become active, and has received NO change in BGP Peering.  Any outage would have been short-lived. Peering has been established for over two hours.

edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800       Estab 02:13:24     NC  1421    1343    17     2
192.###.###.254                     64800       Estab 02:13:24     NC  1420    1338    17     2

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800       Estab 02:13:24     NC  7965    7878    16     1
fd00:#:#:#::#:85fe              64800       Estab 02:13:24     NC  7962    7876    16     1

Thu Mar 27 2025 UTC 19:26:27.592

 

When the preferred Edge is returning to functionality, the Fail Back will break BGP peering for 30 seconds (as designed):

Before Fail Back:

edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800       Estab 02:18:01     NC  1428    1349    17     2
192.###.###.254                     64800       Estab 02:18:01     NC  1427    1344    17     2

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800       Estab 02:18:01     NC  7994    7907    16     1
fd00:#:#:#::#:85fe              64800       Estab 02:18:01     NC  7991    7905    16     1

Thu Mar 27 2025 UTC 19:31:05.075

 

Fail Back has begun. Non-Preferred Edge BGP Peering is down and will remain down for 30 seconds:

edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800       Idle  00:00:00     NC  1428    1351    0      0
192.###.###.254                     64800       Idle  00:00:00     NC  1427    1346    0      0

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800       Idle  00:00:00     NC  7994    7909    0      0
fd00:#:#:#::#:85fe              64800       Idle  00:00:00     NC  7991    7907    0      0

Thu Mar 27 2025 UTC 19:31:06.068


edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800        Idle 00:00:29     NC  1428    1351    0      0
192.###.###.254                     64800        Idle 00:00:29     NC  1427    1346    0      0

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS           State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800        Idle 00:00:29      NC  7994    7909    0      0
fd00:#:#:#::#:85fe              64800        Idle 00:00:29      NC  7991    7907    0      0

Thu Mar 27 2025 UTC 19:31:35.179

 

After this 30-second period, traffic has already failed back to the preferred Edge with minimal to no outage, and BGP is re-established on the Non-Preferred Edge Node.

edge02(tier0_sr[2])>  get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

192.###.###.254                     64800       Estab 00:00:02     NC  1430    1353    0      0
192.###.###.254                     64800       Estab 00:00:02     NC  1429    1348    0      0

BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP summary information for VRF default for address-family: ipv6Unicast
Router ID: 192.###.###.2  Local AS: 65000

Neighbor                        AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

fd00:#:#:#::#:84fe              64800       Estab 00:00:01     NC  7996    7911    0      0
fd00:#:#:#::#:85fe              64800       Estab 00:00:01     NC  7993    7909    0      0

Thu Mar 27 2025 UTC 19:31:37.072

Non-Preferred Edge has returned to original High Availability State:
edge02(tier0_sr[2])> get high-availability status
Thu Mar 27 2025 UTC 20:07:20.064
Service Router
UUID                  : 83cc####-####-####-####-#######11675
state                 : Standby                                ←Non-Preferred Edge has returned to Standby state, waiting to take over if necessary.
type                  : TIER0
mode                  : A/S
failover mode         : Preemptive
rank                  : 1
service count         : 0
service score         : 0
HA ports state
    UUID        : dc72####-####-####-####-#######c3ae4
    op_state    : Down
    addresses   : 169.###.###.2/24;fe80:#:#:#:#:5300/64
Peer Routers
    SR UUID     : fa47####-####-####-####-#######d56ac
    Node UUID   : f0ae####-####-####-####-#######d4639
    HA state    : Active                                       ←Preferred Edge is once again Up and online.

Additional Information