Troubleshooting NIC errors and other network traffic faults in ESXi
search cancel

Troubleshooting NIC errors and other network traffic faults in ESXi

book

Article ID: 341594

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The esxcli network nic stats command may show some counters that are greater than zero, on one or more physical network adapters.

Where vmnic2 is the subject NIC.
$ esxcli network nic stats get -n vmnic2

NIC statistics for vmnic2
   Packets received: 701280499176
   Packets sent: 687061948450
   Bytes received: 664124780523852
   Bytes sent: 676938646792793
   Receive packets dropped: 2452783244
   Transmit packets dropped: 0
   Multicast packets received: 976222150
   Broadcast packets received: 0
   Multicast packets sent: 0
   Broadcast packets sent: 0
   Total receive errors: 0
   Receive length errors: 0
   Receive over errors: 0
   Receive CRC errors: 0
   Receive frame errors: 0
   Receive FIFO errors: 0
   Receive missed errors: 0
   Total transmit errors: 0
   Transmit aborted errors: 0
   Transmit carrier errors: 0
   Transmit FIFO errors: 0
   Transmit heartbeat errors: 0
   Transmit window errors: 0

Notes about esxcli network nic stats ouput:

  • Run the uptime command to see over what time frame the counters have been incremented.  
  • There is no way to clear network counters. It is possible to unload and reload the NIC device driver. This is not recommended as doing so may produce unpredictable results on the host. 
  • To clear the counters, place the host into Maintenance Mode and reboot the host.
  • Because the counters are additive, there is no way to determine when or by which event the counters were incremented.
  • To monitor the counters, use the "watch" command and monitor for increasing outputs:

$ watch esxcli network nic stats get -n vmnic2

In a healthy environment, "errors" should either be zero, or very small as a percentage of the overall total. If errors are present, the hardware vendor should be consulted.

Environment

VMware ESXi Version: 7.x
VMware ESXi Version: 8.x

Cause

Receive packets dropped: This counter is often a combination of other counters that can be found in the "Private statistics" section of the "nicinfo" .txt file that is contained in the "commands" directory of ESXi host log bundles. 

  • Example:  

   NIC statistics for vmnic0:
      Packets received: 1880190132
      Packets sent: 1887598404
      Bytes received: 264206918890
      Bytes sent: 269243305508
      Receive packets dropped: 11592374 <====== THIS IS OF INTEREST ======

  • In the above log bundle, in the "Private statistics" section under vmnic0, we see:

[rxq1] discards rx: 11585755

[rxq2] discards rx: 6619

  • You can see that adding 11585755 + 6619 =  11592374 which is the value in the "Receive packets dropped" counter.  

 

Receive length errors: According to Intel HW manual, this register records "Number of packets with receive length errors. A length error occurs if an incoming packet length field in the MAC header doesn't match the packet length

Receive missed errors: received misses are caused by the NIC running out of hardware descriptors to store incoming packets

Receive over errors: The packets that are discarded by the hardware buffer of the card.

Total receive errors: CRC errors + other errors mentioned above

FIFO or Missed errors (one or the other, not necessarily both) : They will increment and accumulate if physical NIC is not able to handle the peak load of incoming packets with current rx ring buffer size.
 

More about CRC/CRC errors:

CRC: The CRC stands for "Cyclic Redundancy Check".

CRC error: FCS (Frame Check Sequence) field contains a 4-byte CRC value used for error checking.

When a source host assembles a packet, it performs a CRC calculation on all fields in the packet except the Preamble, SFD (Start Frame Delimiter), and FCS using a predetermined algorithm.

The source host stores the value in the FCS field and transmits it as part of the packet.

When the packet is received by the destination host, it performs a CRC test again by using the same algorithm.

If the CRC value calculated at the destination host does not match the value in the FCS field, the destination host discards the packet, considering this as a CRC Error.

Resolution

If any NICs show errors, consult with the hardware vendor to troubleshoot the physical NIC errors.

  1. Check that the driver/firmware of the vmnic is up to date. To check the driver, follow Determining Network/Storage firmware and driver version in ESXi (323110).  For more information on drivers and firmware, see FAQ: Recommendation for Driver/Firmware (318542)
  2. It is possible to mitigate FIFO or Missed errors by increasing the Rx buffer ring size on the physical NIC.

Note: These changes impact network adapter performance and must be validated by the hardware vendor prior to implementing the change.

FIFO or Missed errors will increment if the physical NIC is not able to address the peak load of incoming packets with its assigned ring buffer size. Use the following commands to check the maximum (preset) ring buffer size and current ring buffer size.

$ esxcli network nic ring preset get -n vmnicX
$ esxcli network nic ring current get -n vmnicX

NOTES REGARDING PHYSICAL UPLINK RING BUFFERS:

1) The concept of a ring buffer is to allocate a section of memory which is like a temporary holding area for packets whose packet rate may be so high that the code required to process them has trouble keeping up. 

2) Each type of network adapter has a "preset maximum" which is determined by the device driver.  This is revealed by the above command featuring "preset".

3) This "preset maximum" is usually higher than the default setting which is allocated when ESXi is installed from scratch.  The "current" setting is revealed by the above command featuring "current".

4) Here is an example for the "Broadcom BCM57416 NetXtreme-E 10GBASE-T RDMA Ethernet Controller" adapter:

   Current Ring Size:
   RingInfo:
      RX: 1023
      RX Mini: 0
      RX Jumbo: 0
      TX: 1023

   Preset Maximum Ring Size:
   RingInfo:
      Max RX: 4095
      Max RX Mini: 0
      Max RX Jumbo: 0
      Max TX: 4095

5) In this case, you can set the uplink ring buffer sizes to their allowed maximums using this command:

esxcli network nic ring current set -n vmnicN -r 4095 -t 4095

(where N = the vmnic number; e.g. vmnic5, for example)

6) As a general rule, it is a best practice to have these settings be consistent among all ESXi hosts in your environment with the same adapter type.  

(This is so that you will have similar configurations regardless of where VM workloads are running.)

7) With the development of higher and higher NIC speeds and physical switches, and applications which involve higher volumes of TCP/IP packets and packet rates, it is becoming more common to see situations where increasing the physical uplink ring buffer sizes to the preset maximums, will reduce the possibility of drivers being unable to cope with high packet arrival rates.  

 

Additional Information

Increasing the Rx and Tx values on the hardware side and within the guest OS can significantly enhance VM performance, especially in high I/O environments. However, the effectiveness of this adjustment depends on factors such as the application, operating system, and hardware in use. Therefore, the recommendations should come from the respective application, OS, and hardware vendors to ensure compatibility and optimal performance.