BGP Flaps Caused by Traffic on NSX Edge
search cancel

BGP Flaps Caused by Traffic on NSX Edge

book

Article ID: 385542

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You may occasionally experience BGP going down and coming back up.
  • Executing the 'get dataplane cpu stats' CLI command on the edge will show that some cores are at (or close to) 100% usage, while other cores may show much lighter load. This can cause packet drops due to receive queue overflow, and interface statistics will show an increase in the 'RX misses' counter.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

This can be caused by traffic pattern where traffic bursts in specific flows 
BGP neighbors may flap when control traffic is lost due to Edge dataplane CPUs being overwhelmed by traffic.
/var/log/syslog will show messages containing BGP_DOWN and BGP_UP

There are other reasons why BGP may flap, e.g., network connectivity loss. The traffic related issue noted in this article is harder to diagnose, as some traffic is still flowing between the BGP neighbors.

Resolution

Control traffic prioritization helps avoid this problem, in particular with NSX versions 4.0 and later. However, control messages inside GENEVE tunnels are currently not prioritized, and not all NICs are supported for this feature.

Additional Information

Monitor traffic patterns using tools like Aria Operations for Networks to help diagnose traffic related problems.

BGP neighborship is down alarm
Troubleshooting BGP on NSX-T Edge Nodes
BGP session diagnostics for troubleshooting BGP session flaps on NSX-T edge node