When physical switch interface for a VLAN used for BGP is bounced, to simulate an edge failover, traffic fails once restored
search cancel

When physical switch interface for a VLAN used for BGP is bounced, to simulate an edge failover, traffic fails once restored

book

Article ID: 432601

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • VMware NSX tier 0 gateway is used in Active/Standby (A/S) mode, with 2 BGP interface on each edge node to the physical router 2 separate VLANs, 2 interfaces with VLAN A and 2 interfaces with VLAN B.
    • One VLAN A and one VLAN B interface is on the edge node with the active Tier 0 resides.
    • One VLAN A and one VLAN B interface is on the edge node with the standby Tier 0 resides.
  • When an interface on the physical switch is disabled which has one of the VLANs, say VLAN A, traffic diverts and uses VLAN B and there is no impact.
  • When the VLAN is enabled again on the physical switch, traffic passing through the Tier 0 gateway and active edge now fails.
  • No tier 0 High availability event has occurred, the Active and Standby remain the same.
  • The physical router is configured with ECMP.
  • Prior to the VLAN A disable, the Tier 0 was using next hop to the BGP on that VLAN A.
  • When the interface (VLAN A) is disabled, the tier 0 BGP now uses VLAN B BGP next hop.
  • When the interface (VLAN A) is enabled again, the tier 0 BGP continues using VLAN B next hop.
  • To view the next hop, log in as admin on the edge node where the tier 0 gateway resides.
  • Find the Tier 0 service router (SR) vrf number by running: get logical-routers
  • Then enter the tier 0 SR vrf by running: vrf <number>
  • To see the BGP routing table, run: get bgp ipv4
  • To see the routing table, run: get route

Cause

If BGP receives 2 paths which have equal cost, it will prefer the one which was learned first, therefore the best path.
So when the interface (VLAN A) was brought down, the best path became the BGP next hop on VLAN B.
When the interface (VLAN A) came back up, as VLAN B was the oldest and preferred path, the next hop remained with VLAN B.
However, the physical router BGP reverted to using VLAN A.
Since the edge is using VLAN B, but the physical router is using VLAN A, this causes drops due to the URPF check.
Further details on URPF can be found here North-South packets are dropped by rx_drop_rpf_check due to URPF restrictions

Resolution

Configure the physical router to make VLAN B the less preferred path, if VLAN A is to be preferred.
For example, prepend the AS 3 times on least preferred path.