Receiving Edge NIC transmit queue overflow alarms and been experiencing BGP flapping around the same time
You will likely see log messages similar to these below:
var/log/syslog:
2025-03-14T19:52:14.425z <NSX-manager-FQDN> NSX 12047 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="67c6cd9b-xxxx-xxxx-xxxx-fad16eb92313" tid="12165" level="FATAL" eventState="On" eventFeatureName="edge_health" eventSev="critical" eventType="edge_nic_transmit_queue_overflow"] Edge NIC fp-eth0 transmit queue 2 has overflowed by 2.352738% on Edge node <node-id>. The missed packet count is 8994 and processed packet count is 501398.
var/log/frr/frr.log:
2025/03/14 19:53:14.57512 BGP: %NOTIFICATION: received from neighbor <neighbor-ip-address> 4/0 (Hold Timer Expired) 0 bytesVMware NSX-T Data Center
VMware NSX
VMware ESXi
If the host is generating very high percentage (>5%) of CRC errors on vmnics, it can cause the Tx queues to be busy, affecting the edge VM present on the host and thus, causing the BGP to flap.
vMotion the Edge VM to another stable host, which is not showing high percentage of CRC errors on its vmnics.
Refer to this kb for ESXi host fix: Troubleshooting NIC errors and other network traffic faults in ESXi