Edge NIC out of receive buffer alarm
search cancel

Edge NIC out of receive buffer alarm

book

Article ID: 330475

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Edge NIC out of receive buffer
Event ID: edge_health.edge_nic_out_of_receive_buffer
Alarm Description

  • Purpose: Indicates overflow of Edge NIC receive buffer.
  • Impact: Traffic drop will be observed. Rx misses counter will keep increasing

Environment

VMware NSX-T Data Center
 
Edge Form factors:
  • Bare Metal Edge
  • VM Edge

Cause

NIC out of receive buffer alarm can be raised if the below conditions are observed in one or all CPUs.

  • CPU usage is high, i.e., > 90%
  • RX pps is high
  • Bursty traffic

Resolution

Steps to resolve
For 3.0.0 and higher

Recommended Action:

Run the NSX CLI command 'get dataplane cpu stats' on the edge node and check:

  • If cpu usage is high, i.e., > 90%, then take multiple samples of logical router interfaces stats using the command 'get logical-router interface stats' and if IPSec tunnel is enabled in the topology, then check IPsec tunnel stats using the command 'get ipsecvpn tunnel stats'. Then analyze the stats to see if majority of traffic is fragmented packets or ipsec packets. If yes, then it is expected behavior. If not, datapath is probably busy with other operations. If this alarm lasts more than 2-3 minutes, contact Broadcom Support.
  • If cpu usage is not high, i.e., < 90%, then check if rx pps is high using the command 'get dataplane cpu stats' (just to make sure the traffic rate is increasing). Then increase the ring size by 1024 using the command 'set dataplane ring-size rx <ring-size>'.
    Note: The continuous increase of ring size by 1024 factor can lead to some performance issues. If even after increasing the ring size, the issue persists then it is an indication that edge needs a larger form factor deployment to accommodate the traffic.
  • If the alarm keeps on flapping i.e., triggers and resolves very soon, then it is due to bursty traffic. In this case check if rx pps as described above, if it is not high during the alarm active period then contact Broadcom Support. If pps is high it confirms bursty traffic. Consider suppressing the alarm.
    Note: There is no specific benchmark to decide what is regarded as a high pps value. It depends on infrastructure and type of traffic. The comparison can be made by noting down when alarm is inactive and when it is active.

Maintenance window required for remediation? Yes