vSAN -- Alarm about high pNIC error rate being detected
search cancel

vSAN -- Alarm about high pNIC error rate being detected

book

Article ID: 312096

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 7.x VMware vSAN 8.x VMware vCenter Server

Issue/Introduction

 
With vSphere 7.0 U2 & later, a new Health check "High pNic error rate detected" has been introduced checking certain counters of the physical Network cards (pNic, vmnic) used for vSAN Traffic.
 
If the Warning Threshold of one or more counters have been reached/exceeded, you will see a related error message in the vSphere Web Client referring to a high rate being detected in relation to a specific counter.

Examples:

  • Summary Page of vCenter referring to one or more vSAN Host(s):

  • Summary Page of vSAN Host:

  • Triggered Alarms Page of vSAN Host:

  • vSAN Host --> Monitor --> vSAN --> Performance

  • vSAN Host --> Monitor --> vSAN --> Performance

  • Affected Host --> Monitor --> Events

 

Environment

vSAN 7.0U2 and Later.

 

Cause

The following table shows the Counters & their Thresholds used for the Network Cards (= pNic) being used for vSAN Traffic on a vSAN Host.

MetricWarning ThresholdCritical Threshold
Rx CRC Errors>0.1%>1%
Tx Carrier Errors>0.1%>1%
Rx Errors>0.1%>1%
Tx Errors>0.1%>1%
Rx/Tx Pause>1%>10%
Rx Missed Errors>0.1%>1%
Rx Over Errors>0.1%>1%
Rx Fifo Errors>0.1%>1%


To evaluate the errors seen in the Error Message (as shown via Examples in section Issue/Introduction ), log into the vSAN Host via SSH/Putty and run the following command:

esxcli network nic stats get -n vmnic#

Sample Output:

[root@Host:~] esxcli network nic stats get -n vmnic4
NIC statistics for vmnic4
   Packets received: 17036123232
   Packets sent: 1708000566
   Bytes received: 2293939393939
   Bytes sent: 229320239320
   Receive packets dropped: 0
   Transmit packets dropped: 0
   Multicast packets received: 12333949
   Broadcast packets received: 29343487
   Multicast packets sent: 46123
   Broadcast packets sent: 14599
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 4545
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

[root@Host:~] esxcli network nic stats get -n vmnic4
NIC statistics for vmnic4:
      Packets received: 15043459
      Packets sent: 0
      Bytes received: 126708423293
      Bytes sent: 36416721
      Receive packets dropped: 0
      Transmit packets dropped: 0
      Multicast packets received: 468030126
      Broadcast packets received: 454874088
      Multicast packets sent: 418583
      Broadcast packets sent: 0
      Total receive errors: 13086
      Receive length errors: 13086
      Receive over errors: 0
      Receive CRC errors: 0
      Receive frame errors: 0
      Receive FIFO errors: 0
      Receive missed errors: 0
      Total transmit errors: 0
      Transmit aborted errors: 0
      Transmit carrier errors: 0
      Transmit FIFO errors: 0
      Transmit heartbeat errors: 0
      Transmit window errors: 0

This is the information the vSAN Health Service uses to determine the error rate to alert on (or not).

Resolution

All metrics monitored by this alarm describe conditions between the vmnic port and the physical switch port. If these metrics exceed the thresholds listed above, the cause for this excess should be investigated in the physical network.

If this alert still shows up, the customer should consult their hardware vendor or their own networking team directly.

Note:

  • Part of the above alarms are caused by special network frame length mismatch packets. The physical switch and/or NIC driver vendor may be able to confirm whether it can be safely ignored.
  • These issues are indicative of Layer 1 concerns, the issues are being reported by the health check - and the alarm does not indicate an issue with vSAN itself.
  • If the vmnic reporting the error is acting as the standby vmnic being used for vSAN connectivity, the alarm can be ignored. It is expected that a large percentage of receive packets will be missed by the standby vmnic,  so if the alarm gets triggered, it is NOT indicative of a real performance issue with vSAN.

Additional Information

Workaround:

Bring vmnic down and then up using below commands and monitor if the high pNIC alert still re-appears:

esxcli network nic down -n vmnicX

esxcli network nic up -n vmnicX