vSAN -- Alarm about high pNIC error rate being detected
search cancel

vSAN -- Alarm about high pNIC error rate being detected

book

Article ID: 312096

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 7.x VMware vSAN 8.x VMware vCenter Server

Issue/Introduction

 
With vSphere 7.0 U2 & later, a new Health check "High pNic error rate detected" has been introduced checking certain counters of the physical Network cards (pNic, vmnic) used for vSAN Traffic.
 
If the Warning Threshold of one or more counters have been reached/exceeded, you will see a related error message in the vSphere Web Client referring to a high rate being detected in relation to a specific counter.

Examples:

  • Summary Page of vCenter referring to one or more vSAN Host(s):

 

 

  • Summary Page of vSAN Host:

 

  • Triggered Alarms Page of vSAN Host:

 

  • vSAN Host --> Monitor --> vSAN --> Performance

 

 

  • vSAN Host --> Monitor --> vSAN --> Performance

Environment

vSAN 7.0 U2 & later

 

Cause

The following table shows the Counters & their Thresholds used for the Network Cards (= pNic) being used for vSAN Traffic on a vSAN Host.

Metric Warning Threshold Critical Threshold
Rx CRC Errors >0.1% >1%
Tx Carrier Errors >0.1% >1%
Rx Errors >0.1% >1%
Tx Errors >0.1% >1%
Rx/Tx Pause >1% >10%
Rx Missed Errors >0.1% >1%
Rx Over Errors >0.1% >1%
Rx Fifo Errors >0.1% >1%

 

To evaluate the errors seen in the Error Message (as shown via Examples in section Issue/Introduction ), log into the vSAN Host via SSH/Putty and run the following command:

esxcli network nic stats get -n vmnic#
 
Sample Output:
 

[root@Host:~] esxcli network nic stats get -n vmnic4

NIC statistics for vmnic4

   Packets received: 17036123232

   Packets sent: 1708000566

   Bytes received: 2293939393939

   Bytes sent: 229320239320

   Receive packets dropped: 0

   Transmit packets dropped: 0

   Multicast packets received: 12333949

   Broadcast packets received: 29343487

   Multicast packets sent: 46123

   Broadcast packets sent: 14599

   Total receive errors: 0

   Receive length errors: 0

   Receive over errors: 0

   Receive CRC errors: 0

   Receive frame errors: 0

   Receive FIFO errors: 0

   Receive missed errors: 4545

   Total transmit errors: 0

   Transmit aborted errors: 0

   Transmit carrier errors: 0

   Transmit FIFO errors: 0

   Transmit heartbeat errors: 0

   Transmit window errors: 0

This is the information the vSAN Health Service uses to determine the error rate to alert on (or not).

Resolution

All metrics monitored by this alarm describe conditions between the vmnic port and the physical switch port. If these metrics exceed the thresholds listed above, the cause for this excess should be investigated in the physical network.

Note:

  • Part of the above alarms are caused by special network frame length mismatch packets. The physical switch and/or NIC driver vendor may be able to confirm whether it can be safely ignored.
  • These issues are indicative of Layer 1 concerns, the issues are being reported by the health check - and the alarm does not indicate an issue with vSAN itself.

Additional Information