NSX-T tier-0 logical router in an A/A topology, Internal BGP (iBGP) session are down between the service routers
search cancel

NSX-T tier-0 logical router in an A/A topology, Internal BGP (iBGP) session are down between the service routers

book

Article ID: 318313

calendar_today

Updated On: 10-03-2024

Products

VMware NSX

Issue/Introduction

  • You recently upgraded to NSX-T 3.1, 3.1.1 or 3.1.2 and bgp_down alarms are generated indicating that BGP is down between 2 edge nodes in an active-active (A/A) cluster. 
  • These alarms may also trigger when an edge node is replaced in an edge cluster: "All BGP/BFD sessions are down".
  • With a tier-0 logical router in an A/A topology, there is an inter Service Router (Inter-SR) iBGP routing feature to handle asymmetric routing failures. It is noticed that this Inter-SR iBGP session(s) never get established.
  • From the tier-0 SR context on an NSX-T Data Center edge node, you can ping the iBGP peer IP address, but there may be packets lost.

 

Environment

VMware NSX-T Data Center

Cause

There is an issue with Inter-SR routing ports in the internal Virtual Routing and Forwarding (VRF) context which causes two edge nodes have the same MAC address.

In the example below the same MAC address '<MAC_ADDR1>' is being applied to the Inter-SR interfaces for both NSX-T Data Center edge nodes 2 and 3 in a three edge node cluster:

> get neighbor
Wed May 12 2021 UTC 13:05:39.304
Logical Router
UUID : ee98c58b-####-####-####-##########60
VRF : 9
LR-ID : 3082
Name : SR-Provider-Tier0
Type : SERVICE_ROUTER_TIER0
Neighbor
    Interface : 0cf071e2-####-####-####-##########7a
    IP : 10.xx.xx.1
    MAC : <MAC>
    State : reach
    Timeout : 341

    Interface : 79bfba70-####-####-####-##########3c
    IP : 169.254.0.131       <========== Inter router link holds IP in the range of 169.254.x.x
    MAC : <MAC_ADDR1>
    State : reach
    Timeout : 860

    Interface : 0cf071e2-####-####-####-##########7a
    IP : 10.xx.xx.162
    MAC : <MAC>
    State : reach
    Timeout : 938

    Interface : 0cf071e2-####-####-####-##########7a
    IP : 10.xx.xx.2
    MAC : <MAC>
    State : reach
    Timeout : 599

    Interface : 0cf071e2-####-####-####-##########7a
    IP : 10.xx.xx.161
    MAC : <MAC>
    State : reach
    Timeout : 1031

    Interface : 79bfba70-####-####-####-##########3c
    IP : 169.254.0.132
    MAC : <MAC_ADDR1>
    State : reach
    Timeout : 577


Note: This issue can occur on a two node Edge cluster as well.
This issue affects all versions of NSX-T 3.x prior to NSX-T 3.1.3.

Resolution

Issue is resolved in NSX-T 3.1.3.

Workaround:
The following steps can also be followed. 

1. Turn off the Inter-SR iBGP option from the NSX Manager UI. This will delete all internal routing ports and iBGP sessions.
Networking -> Tier-0 Gateways -> T0 Edit-> BGP -> Turn Off 'Inter SR iBGP':



2. Turn on the Inter-SR iBGP option again, which will create new internal routing ports without the duplicate MAC addresses, allowing the iBGP sessions to successfully establish.

Additional Information

Impact/Risks:
Inter SR iBGP session will not work and Datapath will be impacted for asymmetric and ECMP topologies.