Edge dataplane outage and "Edge NIC fp-eth1 transmit ring buffer has overflowed"
search cancel

Edge dataplane outage and "Edge NIC fp-eth1 transmit ring buffer has overflowed"

book

Article ID: 427070

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESXi

Issue/Introduction

  • Edge dataplane outage observed for North-South traffic
  • Below alarm could also be seen during this time.

Edge NIC fp-eth# transmit ring buffer has overflowed by 57.607269% on Edge node <edge-node-uuid>. The missed packet count is #### and processed packet count is ####.
Recommended Action
1. If a lot of VMs are accommodated along with edge by the hypervisor then edge VM might not get time to run, hence the packets might not be retrieved by hypervisor. Then probably migrating the edge VM to a host with fewer VMs. 2. Increase the ring size by 1024 using the command `set dataplane ring-size tx <ring-size>`. If even after increasing the ring size, the issue persists then contact VMware Support as the ESX side transmit ring buffer might be of lower value. If there is no issue on ESX side, it indicates the edge needs to be scaled to a larger form factor deployment to accommodate the traffic. 3. If the alarm keeps on flapping, i.e., triggers and resolves very soon, then it is due to bursty traffic. In this case check if tx pps using the command `get dataplane cpu stats`. If it is not high during the alarm active period then contact VMware Support. If pps is high it confirms bursty traffic. Consider suppressing the alarm. NOTE - There is no specific benchmark to decide what is regarded as a high pps value. It depends on infrastructure and type of traffic. The comparison can be made by noting down when alarm is inactive and when it is active. 

  • No high traffic seen on the ESXi host vmnics where the edge resides.
  • ESXi host have  Intel(R) Ethernet Controller E810-C for SFP vmnics running icen drivers
  • Below logs can be seen on NSX manager node under var/log/syslog
var/log/syslog.49:2025-12-14T13:54:49.237Z <nsx-mgr> NSX 3646788 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="fb6a4af8-####-####-9012-3381ac######" tid="3647006" level="FATAL" eventState="On" eventFeatureName="edge_health" eventSev="critical" eventType="edge_nic_out_of_transmit_buffer"] Edge NIC fp-eth# transmit ring buffer has overflowed by 57.607269% on Edge node <edge-node-uuid>. The missed packet count is #### and processed packet count is ####.
  • On associated ESXi host, Hang was detected and port was blocked.
2025-12-12T10:16:30.337Z In(182) vmkernel: cpu33:2185809)Vmxnet3: 19375: <edge>.eth#,##:##:##:##:##:##, portID(#########): Hang detected,numHangQ: 1, enableGen: 33
2025-12-12T10:16:30.343Z In(182) vmkernel: cpu33:2185809)NetPort: 1890: disabled port 0x#######
  • On associated ESXi host, vmkernel core dump could be seen. (/var/core)
vmkernel-zdump

Environment

VMware NSX

VMware vSphere ESXi 

Cause

Tx Hang due to Interrupt Loss

Resolution

Fix :

  • For VMware vSphere ESXi 8.x, upgrade the associated icen drivers of vmnics to 2.2.2.0 and above 
  • For VMware vSphere ESXi 9.x, upgrade the associated icen drivers of vmnics to 2.2.3.0 and above