get dataplane cpu stats
' CLI command on the edge will show that some cores are at (or close to) 100% usage, while other cores may show much lighter load. This can cause packet drops due to receive queue overflow, and interface statistics will show an increase in the 'RX misses
' counter.VMware NSX
VMware NSX-T Data Center
This can be caused by traffic pattern where traffic bursts in specific flows
BGP neighbors may flap when control traffic is lost due to Edge dataplane CPUs being overwhelmed by traffic./var/log/syslog
will show messages containing BGP_DOWN
and BGP_UP
There are other reasons why BGP may flap, e.g., network connectivity loss. The traffic related issue noted in this article is harder to diagnose, as some traffic is still flowing between the BGP neighbors.
Control traffic prioritization helps avoid this problem, in particular with NSX versions 4.0 and later. However, control messages inside GENEVE tunnels are currently not prioritized, and not all NICs are supported for this feature.
Monitor traffic patterns using tools like Aria Operations for Networks to help diagnose traffic related problems.
BGP neighborship is down alarm
Troubleshooting BGP on NSX-T Edge Nodes
BGP session diagnostics for troubleshooting BGP session flaps on NSX-T edge node