Error stats for pnic reported in the hostd logs
search cancel

Error stats for pnic reported in the hostd logs

book

Article ID: 399074

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • You see messages like the following being logged regularly on an ESX host in /var/run/log/hostd.log:

YYYY-MM-DDTHH:MM:SS.###<Time_Zone> warning hostd[2100510] [Originator@6876 sub=Statssvc] Error stats for pnic: vmnic1
--> droppedRx: 4640800
--> errorsRx: 6
--> RxCRCErrors: 6

  • You want to know how to interpret these logs and how they could be resolved.

Environment

VMware vSphere ESXi

Cause

Those log messages are reporting counter values, which are used to track how many packets have been dropped or received with errors since the host was last powered on.

errorsRx - Is an aggregate counter that increments each time a packet is received in error. You may need to investigate further into the additional counters, which will be explained in the resolution section, to find a more specific reason for why the packets are being reported as being in error. But in the above log example, we can already see that the 'errorsRx' counter value matches with the 'RxCRCErrors,' so we know that packets are being received with unexpected checksum values.

RxCRCErrors - If there are packets with CRC errors, then that usually indicates packets are being corrupted somewhere along the data path to the receiving host.

droppedRx - Is an aggregate counter that increments each time a packet is dropped at the pNIC. You may need to investigate further into the additional counters, which will be explained in the resolution section, to find a more specific reason for why the packets are being reported as dropped.

Resolution

  • The first step to understanding and troubleshooting these counters is to consult the following KB for detailed explanations on all the counters and how you can gather the counter data from a host:

Troubleshooting NIC errors and other network traffic faults in ESXi

  • If any NICs show errors, consult with the hardware vendor to troubleshoot the physical NIC errors, as they are only reported to the ESXi hosts. It may be possible that you would need to engage with your network team to trace the cause of the errors. Whether you should be concerned would depend on the values being reported, when they are reported and how often they are reported.
  • Using the following example, it can be seen that the 'Receive packets dropped' (droppedRx) counter has a value of 1156343, but you would need to dive deeper into the pNIC Private Statistics to determine the cause for the drops:

NIC statistics for vmnic4:
      Packets received: 98811091721
      Packets sent: 115495631953
      Bytes received: 94549901259474
      Bytes sent: 116028215249968
      Receive packets dropped: 1156343 
      Transmit packets dropped: 0
      Multicast packets received: 2366401
      Broadcast packets received: 0
      Multicast packets sent: 0
      Broadcast packets sent: 0
      Total receive errors: 0
      Receive length errors: 0
      Receive over errors: 0
      Receive CRC errors: 0
      Receive frame errors: 0
      Receive FIFO errors: 0
      Receive missed errors: 0
      Total transmit errors: 0
      Transmit aborted errors: 0
      Transmit carrier errors: 0
      Transmit FIFO errors: 0
      Transmit heartbeat errors: 0
      Transmit window errors: 0

  • You can gather the private statistics by running the following script in the CLI on the ESX host being investigated:
/usr/lib/vmware/vm-support/bin/nicinfo.sh
  • Using the following example of Private statistics, it can be seen that the 'rx_no_bufs' counter equals the value of the 'Receive packets dropped' counter, so it can be concluded that the packet buffers\queue are being filled faster than the ability for the host to pick the packets out of the queue and process them:

NIC Private statistics:

      tx_frames_ok: 115495633765
      tx_unicast_frames_ok: 115492346861
      tx_multicast_frames_ok: 971987
      tx_broadcast_frames_ok: 2314917
      tx_bytes_ok: 116028216606031
      tx_unicast_bytes_ok: 116027927182476
      tx_multicast_bytes_ok: 150527407
      tx_broadcast_bytes_ok: 138896148
      tx_drops: 0
      tx_errors: 0
      tx_tso: 0
      rx_frames_ok: 98811093577
      rx_frames_total: 98812249920
      rx_unicast_frames_ok: 98748950011
      rx_multicast_frames_ok: 2366401
      rx_broadcast_frames_ok: 60933508
      rx_bytes_ok: 94549902853644
      rx_unicast_bytes_ok: 94547693433065
      rx_multicast_bytes_ok: 350369527
      rx_broadcast_bytes_ok: 3657148407
      rx_drop: 0
      rx_no_bufs: 1156343

  • A possible solution for resolving the issue with the buffer being filled is to increase the size of the pNIC RX Ring buffer. See referenced KB 341594 for more details.

 

  • Consult with your hardware vendor to determine if RSS is supported on your pNIC type and how many RSS engines are supported. If RSS is supported, then confirm if it is enabled and how many RSS engines are configured. If not enabled, then you might want to consider enabling it based on vendor recommendations. If multiple RSS engines are not supported on your NIC, then you could consider discussing with your vendor using Default RSS as an alternative.

NOTE:  In the context of this article, the word "vendor" means:

a) The server vendor, if the server was purchased with the device (network adapter and/or HBA) installed; or

b) The vendor of the device, if the device (network adapter and/or HBA) was purchased separately. "

  • Other possible reasons for dropped packets can be CPU or memory contention on the host. Please refer to the following KB for information on how to troubleshoot ESX performance:

Troubleshooting ESX/ESXi virtual machine performance issues