Intermittent triggering of vSAN alarm "High pNic error rate detected". Mostly, or only, receive length errors reported in NIC stats.
search cancel

Intermittent triggering of vSAN alarm "High pNic error rate detected". Mostly, or only, receive length errors reported in NIC stats.

book

Article ID: 382166

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

vSAN alarm "High pNic error rate detected" can be triggered if certain physical NIC metric counter thresholds are surpassed.  See the following KB for more detail regarding the metrics monitored and their thresholds:

Receive (Rx) length errors are rolled up into Total Receive (Rx) Errors.  Total Receive (Rx) errors can surpass the alarm threshold if receive (Rx) length error counters are detected and compile over time.  Increases seen in the receive (Rx) length metric counter may not necessarily indicate an unhealthy physical network in some environments.

Environment

VMware vSAN 7.x
VMware vSAN 8.x

Cause

Physical networking statistics for NIC(s) backing the vSAN vmk interface can be attained via:

  • esxcli network nic stats get –n <vmnic interface>

A review of the statistics may show a value of 0 or a very low value for the various monitored receive (Rx) errors metrics, except for receive (Rx) length errors.  In some cases the following condition is seen:

  • Total Receive errors = Receive length errors

Example:

NIC statistics for vmnic0:
      Packets received: 7710488742
      Packets sent: 6485936294
      Bytes received: 2782530839734
      Bytes sent: 1570145833390
      Receive packets dropped: 212498
      Transmit packets dropped: 0
      Multicast packets received: 406655653
      Broadcast packets received: 1675716300
      Multicast packets sent: 470456
      Broadcast packets sent: 54219
      Total receive errors: 770686
      Receive length errors: 770686
      Receive over errors: 0
      Receive CRC errors: 0
      Receive frame errors: 0
      Receive FIFO errors: 0
      Receive missed errors: 0
      Total transmit errors: 0
      Transmit aborted errors: 0
      Transmit carrier errors: 0
      Transmit FIFO errors: 0
      Transmit heartbeat errors: 0
      Transmit window errors: 0

Notice that the only receive (Rx) metric rolling up and contributing to the Total receive (Rx) metric counter is receive (Rx) length.

Resolution

Network packets where the frame length value and the actual payload size differ will be counted under the receive (Rx) length error metric.  Some private networking protocols have this special frame structure by design.  Examples are Cisco WLCCP, Cisco DTP, and some STP packages.  These packets are not corrupted or harmful to physical networking performance.  If needed, physical networking vendor engagement may be pursued by the customer to confirm the source of origin for these types of network packets.

The intermittent triggering of the "High pNic error rate detected" vSAN alarm can be safely ignored if there is no measurable vSAN network performance degradation being reported and if receive (Rx) length errors are the primary metric contributing to the Total receive (Rx) metric threshold violation.

Of course, should vSAN performance degradation be negatively impacting VM production, it is recommended to review vSAN performance data to investigate if the potential bottleneck is at the physical network/NIC level or elsewhere in the vSAN stack.  Reference: