Error stats for pnic reported in the hostd logs

Products

VMware vSphere ESXi

Issue/Introduction

You see messages like the following being logged regularly on an ESX host in /var/run/log/hostd.log:

YYYY-MM-DDTHH:MM:SS.###<Time_Zone> warning hostd[2100510] [Originator@6876 sub=Statssvc] Error stats for pnic: vmnic1
--> droppedRx: 4640800
--> errorsRx: 6
--> RxCRCErrors: 6

You want to know how to interpret these logs and how they could be resolved.

Environment

VMware vSphere ESXi

Cause

Those log messages are reporting counter values, which are used to track how many packets have been dropped or received with errors since the host was last powered on.

errorsRx - Is an aggregate counter that increments each time a packet is received in error. You may need to investigate further into the additional counters, which will be explained in the resolution section, to find a more specific reason for why the packets are being reported as being in error. But in the above log example, we can already see that the 'errorsRx' counter value matches with the 'RxCRCErrors,' so we know that packets are being received with unexpected checksum values.

RxCRCErrors - If there are packets with CRC errors, then that usually indicates packets are being corrupted somewhere along the data path to the receiving host.

droppedRx - Is an aggregate counter that increments each time a packet is dropped at the pNIC. You may need to investigate further into the additional counters, which will be explained in the resolution section, to find a more specific reason for why the packets are being reported as dropped.

Resolution

The first step to understanding and troubleshooting these counters is to consult the following KB for detailed explanations on all the counters and how you can gather the counter data from a host:

Troubleshooting NIC errors and other network traffic faults in ESXi

If any NICs show errors, consult with the hardware vendor to troubleshoot the physical NIC errors, as they are only reported to the ESXi hosts. It may be possible that you would need to engage with your network team to trace the cause of the errors. Whether you should be concerned would depend on the values being reported, when they are reported and how often they are reported.
Using the following example, it can be seen that the 'Receive packets dropped' (droppedRx) counter has a value of 1156343, but you would need to dive deeper into the pNIC Private Statistics to determine the cause for the drops:

NIC statistics for vmnic4:
Packets received: 98811091721
Packets sent: 115495631953
Bytes received: 94549901259474
Bytes sent: 116028215249968
Receive packets dropped: 1156343
Transmit packets dropped: 0
Multicast packets received: 2366401
Broadcast packets received: 0
Multicast packets sent: 0
Broadcast packets sent: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 0
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0

You can gather the private statistics by running the following script in the CLI on the ESX host being investigated:

/usr/lib/vmware/vm-support/bin/nicinfo.sh

Using the following example of Private statistics, it can be seen that the 'rx_no_bufs' counter equals the value of the 'Receive packets dropped' counter, so it can be concluded that the packet buffers\queue are being filled faster than the ability for the host to pick the packets out of the queue and process them:

NIC Private statistics:

tx_frames_ok: 115495633765
tx_unicast_frames_ok: 115492346861
tx_multicast_frames_ok: 971987
tx_broadcast_frames_ok: 2314917
tx_bytes_ok: 116028216606031
tx_unicast_bytes_ok: 116027927182476
tx_multicast_bytes_ok: 150527407
tx_broadcast_bytes_ok: 138896148
tx_drops: 0
tx_errors: 0
tx_tso: 0
rx_frames_ok: 98811093577
rx_frames_total: 98812249920
rx_unicast_frames_ok: 98748950011
rx_multicast_frames_ok: 2366401
rx_broadcast_frames_ok: 60933508
rx_bytes_ok: 94549902853644
rx_unicast_bytes_ok: 94547693433065
rx_multicast_bytes_ok: 350369527
rx_broadcast_bytes_ok: 3657148407
rx_drop: 0
rx_no_bufs: 1156343

A possible solution for resolving the issue with the buffer being filled is to increase the size of the pNIC RX Ring buffer. See referenced KB 341594 for more details.

Consult with your hardware vendor to determine if RSS is supported on your pNIC type and how many RSS engines are supported. If RSS is supported, then confirm if it is enabled and how many RSS engines are configured. If not enabled, then you might want to consider enabling it based on vendor recommendations. If multiple RSS engines are not supported on your NIC, then you could consider discussing with your vendor using Default RSS as an alternative.

NOTE: In the context of this article, the word "vendor" means:

a) The server vendor, if the server was purchased with the device (network adapter and/or HBA) installed; or

b) The vendor of the device, if the device (network adapter and/or HBA) was purchased separately. "

Other possible reasons for dropped packets can be CPU or memory contention on the host. Please refer to the following KB for information on how to troubleshoot ESX performance:

Troubleshooting ESX/ESXi virtual machine performance issues