BGP neighbor goes down when one edge in a two node edge clusters exits maintenance mode
search cancel

BGP neighbor goes down when one edge in a two node edge clusters exits maintenance mode

book

Article ID: 367930

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

BGP neighborship between the edge and the TOR breaks when the 2nd edge in the cluster exits maintenance mode

Environment

The customer has a T0 Active-Active Edge cluster with two edge nodes and they edges are connected as follows 

          CSR
|                       |
BGP               BGP
|                      |
node1 --isr--node2
Active           Active

When  the customer places one of the two edges (edge-1) in maintenance mode and exits the maintenance mode, the BGP neighborship between other edge (edge-2) breaks and moves from Established state to idle state 

 

Cause

Suppose both edge has two vteps X1, X2 and Y1, Y2. Two tunnels for HA will be created (X1, Y1), (X2, Y2). These two tunnels are excluded when considering "All Tunnels Down" scenario, i.e. we won't trigger node down if there are only these two tunnels on the edge and both of them are down.

However, in addition to the these two tunnels, if there are logical topology that include DR and overlay segment (for example, T0-LR will have a transit logical switch between T0-SR & T0-DR) the tunnels (X1, Y2) and (X2, Y1) may also be created.  The tunnel driven by l2 span is based on a hash so it is still possible they may reuse tunnel (X1, Y1) or (X2, Y2), but may use different tunnel (X1, Y2) or (X2, Y1).

When the edge-2 exits the maintenance mode,  since it is possible that the new TEP tunnels (X1, Y2) or (X2,Y1)  could be formed, these TEPs are not added into an excluded list. Since this was marked down immediately while coming up, and bgp is also marked down because routing is marked down.

Resolution

Workaround:
1. Use only single vtep for the edge.
2. Add some VMs to downlink segments.


Issue is resolved in version 4.2.0

Additional Information

If using workaround, adding VMs to downlink segment can be done at any time without maintenance window.
If reducing multi-vteps to single vtep, traffic impact can happen for few seconds to probably a minute depending on the scale.
Otherwise a complete fix will require upgrade to 4.2.0 or later.