Title: Alarm for Edge NIC out of receive buffer Event ID: edge_health.edge_nic_out_of_receive_buffer Alarm Description
Purpose: Indicates overflow of Edge NIC receive buffer.
Impact: Traffic drop will be observed. Rx misses counter will keep increasing
Environment
VMware NSX-T Data Center
Edge Form factors:
Bare Metal Edge
VM Edge
Cause
NIC out of receive buffer alarm can be raised if the below conditions are observed in one or all CPUs.
CPU usage is high, i.e., > 90%
RX pps is high
Bursty traffic
Resolution
Steps to resolve For 3.0.0 and higher
Recommended Action:
Run the NSX CLI command 'get dataplane cpu stats' on the edge node and check:
If cpu usage is high, i.e., > 90%, then take multiple samples of logical router interfaces stats using the command 'get logical-router interface stats' and if IPSec tunnel is enabled in the topology, then check IPsec tunnel stats using the command 'get ipsecvpn tunnel stats'. Then analyze the stats to see if majority of traffic is fragmented packets or ipsec packets. If yes, then it is expected behavior. If not, datapath is probably busy with other operations. If this alarm lasts more than 2-3 minutes, contact Broadcom Support.
If cpu usage is not high, i.e., < 90%, then check if rx pps is high using the command 'get dataplane cpu stats' (just to make sure the traffic rate is increasing). Then increase the ring size by 1024 using the command 'set dataplane ring-size rx <ring-size>'. Note: The continuous increase of ring size by 1024 factor can lead to some performance issues. If even after increasing the ring size, the issue persists then it is an indication that edge needs a larger form factor deployment to accommodate the traffic.
If the alarm keeps on flapping i.e., triggers and resolves very soon, then it is due to bursty traffic. In this case check if rx pps as described above, if it is not high during the alarm active period then contact Broadcom Support. If pps is high it confirms bursty traffic. Consider suppressing the alarm. Note: There is no specific benchmark to decide what is regarded as a high pps value. It depends on infrastructure and type of traffic. The comparison can be made by noting down when alarm is inactive and when it is active.