Inter SR BGP peering flaps when route count exceeds Configuration Maximum
search cancel

Inter SR BGP peering flaps when route count exceeds Configuration Maximum

book

Article ID: 322442

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
NSX UI may show regular BGP alerts.
 
/var/log/frr/frr.log on Edge shows "Hold timer expire" for Inter SR BGP IPs (169.254.0.130 or 169.254.0.131) every 3 seconds:
root@edge:~# grep 'Hold timer expire'   /var/log/frr/frr.log
2021/06/23 20:35:10.854171 BGP: 169.254.0.130 [FSM] Hold timer expire
2021/06/23 20:35:15.854550 BGP: 169.254.0.130 [FSM] Hold timer expire
2021/06/23 20:35:20.855850 BGP: 169.254.0.130 [FSM] Hold timer expire
2021/06/23 20:35:25.857472 BGP: 169.254.0.130 [FSM] Hold timer expire
 
/var/log/syslog shows BGP state flapping constantly:
root@edge:~# grep "state=BGP" /var/log/syslog
2021-06-15T22:30:40.999931+00:00 NSX 5182 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 169.254.0.130, peer_uuid: <UUID> in SR: <UUID>, state=BGP_UP
2021-06-15T22:30:44.993615+00:00 NSX 5182 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 169.254.0.130, peer_uuid: <UUID> in SR: <UUID>, state=BGP_DOWN
2021-06-15T22:30:46.009058+00:00 NSX 5182 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 169.254.0.130, peer_uuid: <UUID> in SR: <UUID>, state=BGP_UP
2021-06-15T22:30:49.994782+00:00 NSX 5182 FABRIC [nsx@6876 comp="nsx-edge" subcomp="rcpm" s2comp="routing-service-realization" level="INFO"] Alarm for BGP 169.254.0.130, peer_uuid: <UUID> in SR: <UUID>, state=BGP_DOWN


Cause

If BGP Update packet size between Inter SR interfaces exceeds MTU along the datapath, the packet is dropped and Inter SR BGP peering will flap when trying to become established.
 
Example command to check number of routes on T0:
edge(tier0_sr)> get route | count via
Number of lines that match pattern 'via': XXXX
 
To verify that Update packets exceed MTU size,
1. Get UUID of Inter SR HA interface:
edge(tier0_sr)> get interfaces
 
2. Exit to Edge shell and send packet capture at Inter SR port to file:
edge(tier0_sr)> exit
edge> set capture session 1 interface <UUID> direction dual

edge> set capture session 1 file <filename>
 
3. Examine packet capture in Wireshark. This KB is applicable to one or more UPDATE packets where the initial Update packet exceeds the MTU size:
image.png
To check if there is any lower MTU along the network path:
1. Get the Inter-SR port MTU and perform a ping test with that packet size.
Example:
If the Inter-SR port MTU is 1500 and the Inter-SR neighbor address is 169.254.0.131, then run the following ping command:
edge(tier0_sr)> ping 169.254.0.131 size 1500 dfbit enable
If the ping command is failing, then it means there is a lower MTU interface somewhere along the network path.

This scenario with the BGP routing UPDATE packet exceeding the MTU can occur when there are enough prefixes that make the Update packet size larger than the MTU.

Resolution

In general, the Inter-SR port MTU, Global logical MTU (or Edge VTEP MTU), ESX PNIC MTU, and TOR MTU must have following relationship:
Inter-SR port MTU < Global logical MTU <= ESX PNIC MTU == TOR MTU

In the case for Federation, ICMP errors ("Fragmentation needed") should be enabled on the TOR so the Edge can perform PMTU discovery and fragment packets as needed.

These resolutions are recommended to cover for the GENEVE overhead.