Intermittent vSAN Health Alarm: High pNIC Rx Generic error rate detected
search cancel

Intermittent vSAN Health Alarm: High pNIC Rx Generic error rate detected

book

Article ID: 439043

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • vSAN alarm "High pNic error rate detected" can be triggered if certain physical NIC metric counter thresholds are surpassed.  Check  KB "Alarm about high pNIC error rate being detected" for more information regarding the metrics monitored and their thresholds.
  • The following alarm is triggered  in the vCenter Server "High pNic Rx Generic error rate detected"

 

 

 

Environment

  • VMware vSAN 8.x
  • VMware vSAN 9.x
  • VMware Cloud Foundation

Cause

  • Receive Checksum Errors (csumErr)  on random Rx queues of particular physical NICs cause this issue
  • You will notice specific checksum errors when inspecting NIC statistics via the CLI, even if no performance degradation is observed e.g. 

 

# esxcli network nic stats get -n vmnicX

NIC statistics for vmnic1:
 
Packets received: 4802380244
Total receive errors: 2244
 
Queue Statistics:
 
rxq0: totalPkts=1381473796 totalBytes=471928939608 nonEopDescs=0 allocRxBufFail=0 csumErr=1836
rxq1: totalPkts=810298599 totalBytes=1168073677028 nonEopDescs=0 allocRxBufFail=0 csumErr=408

 

  • These errors typically signify that the Network Interface Card (NIC) received packets that failed data integrity validation.

Hardware Offloading: This can occur if the NIC’s hardware offload engine improperly validates incoming packets or if the packets themselves were corrupted during transmission across the physical switch fabric.

Non-Harmful Protocols: Certain management or discovery protocols (e.g., Cisco DTP, WLCCP) may use non-standard frame structures that trigger checksum or length errors in the ESXi driver accounting, despite being harmless to the environment.

 

Resolution

  1. Validation: If the Total receive errors is low relative to Packets received (e.g., < 0.1%) and there is no measurable vSAN latency, the alert can often be considered informational.
  2. Physical Network Audit: If csumErr continues to increment rapidly, engage the physical networking vendor to perform a packet capture (using pktcap-uw) on the switch port to identify if a specific device or protocol is sending malformed frames.
  3. Driver/Firmware Alignment: Ensure the NIC is on a supported driver/firmware combination as listed in the Broadcom Compatibility Guide

Additional Information