BGP neighborship between the edge and the TOR breaks when the 2nd edge in the cluster exits maintenance mode
The customer has a T0 Active-Active Edge cluster with two edge nodes and they edges are connected as follows
CSR
| |
BGP BGP
| |
node1 --isr--node2
Active Active
When the customer places one of the two edges (edge-1) in maintenance mode and exits the maintenance mode, the BGP neighborship between other edge (edge-2) breaks and moves from Established state to idle state
Suppose both edge has two vteps X1, X2 and Y1, Y2. Two tunnels for HA will be created (X1, Y1), (X2, Y2). These two tunnels are excluded when considering "All Tunnels Down" scenario, i.e. we won't trigger node down if there are only these two tunnels on the edge and both of them are down.
However, in addition to the these two tunnels, if there are logical topology that include DR and overlay segment (for example, T0-LR will have a transit logical switch between T0-SR & T0-DR) the tunnels (X1, Y2) and (X2, Y1) may also be created. The tunnel driven by l2 span is based on a hash so it is still possible they may reuse tunnel (X1, Y1) or (X2, Y2), but may use different tunnel (X1, Y2) or (X2, Y1).
When the edge-2 exits the maintenance mode, since it is possible that the new TEP tunnels (X1, Y2) or (X2,Y1) could be formed, these TEPs are not added into an excluded list. Since this was marked down immediately while coming up, and bgp is also marked down because routing is marked down.
Workaround:
1. Use only single vtep for the edge.
2. Add some VMs to downlink segments.
Issue is resolved in version 4.2.0
If using workaround, adding VMs to downlink segment can be done at any time without maintenance window.
If reducing multi-vteps to single vtep, traffic impact can happen for few seconds to probably a minute depending on the scale.
Otherwise a complete fix will require upgrade to 4.2.0 or later.