BGP flapping and Edge Nic transmit queue overflow alarms are observed in the Manager UI
search cancel

BGP flapping and Edge Nic transmit queue overflow alarms are observed in the Manager UI

book

Article ID: 390907

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Receiving Edge NIC transmit queue overflow alarms and been experiencing BGP flapping around the same time

You will likely see log messages similar to these below:

var/log/syslog:
2025-03-14T19:52:14.425z <NSX-manager-FQDN> NSX 12047 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="67c6cd9b-xxxx-xxxx-xxxx-fad16eb92313" tid="12165" level="FATAL" eventState="On" eventFeatureName="edge_health" eventSev="critical" eventType="edge_nic_transmit_queue_overflow"] Edge NIC fp-eth0 transmit queue 2 has overflowed by 2.352738% on Edge node <node-id>. The missed packet count is 8994 and processed packet count is 501398.

 

var/log/frr/frr.log:
2025/03/14 19:53:14.57512 BGP: %NOTIFICATION: received from neighbor <neighbor-ip-address> 4/0 (Hold Timer Expired) 0 bytes

Environment

VMware NSX-T Data Center

VMware NSX

VMware ESXi

Cause

If the host is generating very high percentage (>5%) of CRC errors on vmnics, it can cause the Tx queues to be busy, affecting the edge VM present on the host and thus, causing the BGP to flap.

Resolution

vMotion the Edge VM to another stable host, which is not showing high percentage of CRC errors on its vmnics.

Refer to this kb for ESXi host fix: Troubleshooting NIC errors and other network traffic faults in ESXi