Getting alarm "High pNic error rate detected" on hosts in vSAN clusters using Mellanox NICs
search cancel

Getting alarm "High pNic error rate detected" on hosts in vSAN clusters using Mellanox NICs

book

Article ID: 392333

calendar_today

Updated On:

Products

VMware vSAN 7.x VMware vSAN 8.x

Issue/Introduction

  • In the vSAN performance view the “High pNic RX Missed error rate detected” error is displayed.
  • Running the command  /usr/lib/vmware/vm-support/bin/nicinfo.sh on an affected ESXi host will show that the Receive missed errors count correlate with the outOfBuffer count, for example:

       NIC statistics for vmnic1:
          Packets received: 150846641743
          Packets sent: 41468571098
          Bytes received: 181586517120376
          Bytes sent: 140804844490064
          Receive packets dropped: 0
          Transmit packets dropped: 0
          Multicast packets received: 377136363
          Broadcast packets received: 561118710
          Multicast packets sent: 1518626
          Broadcast packets sent: 552730
          Total receive errors: 0
          Receive length errors: 0
          Receive over errors: 0
          Receive CRC errors: 0
          Receive frame errors: 0
          Receive FIFO errors: 0
          Receive missed errors: 174298

      NIC Private statistics: 

          PSID: HP_2690110034
          firmware syndrome: 0x0000
          asicSensorTemperature: 40
          rxSwPackets: 150846644025
          rxSwBytes: 181586518824277
          txSwPackets: 41468572423
          txSwTsoPackets: 5914586974
          txSwTsoBytes: 129653616908942
          txSwTsoInnerPackets: 0
          txSwTsoInnerBytes: 0
          rxSwCsumUnnecessary: 0
          rxSwCsumNone: 688525027
          rxSwCsumComplete: 150158118998
          rxSwCsumUnnecessaryInner: 1
          txSwCsumPartial: 41451768126
          txSwCsumPartialInner: 0
          txSwQueueStopped: 277
          txSwQueueWake: 277
          txSwQueueDropped: 0
          txSwXmitMore: 10585153893
          rxSwWqeErr: 0
          rxSwBuffAllocErr: 0
          linkDownEventsPhy: 0
          watchdogReset: 0
          outOfBuffer: 174298

     

 

Environment

VMware vSAN 8.x

Cause

The "out of buffers" can be caused by software slowness, over commitment of the hypervisor, or physical NIC. 

Receive missed errors indicates issues with the NIC not being able to store or process due to lack of hardware buffer.  

Resolution

Suggested workaround:

  1. You can set the uplink ring buffer sizes to their allowed maximums on all hosts using this command, for example:

       #   esxcli network nic ring current set -n vmnic1 -r 8192 -t 8192

    Note: These changes impact network adapter performance and must be validated by the hardware vendor prior to implementing the change. 

    [

        • In certain instances, querying (esxcli network nic ring current get) or changing (esxcli network nic ring current set) ring buffers fail with following errors:

    [root@esx01:~] esxcli network nic ring current get -n vmnic0
    Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.: Not supported: VSI node (233:VSI_NODE_net_pNics_firmware_ringParams)

    [root@esxvsan1:~] esxcli network nic ring current set -n vmnic1 -r 8192 -t 8192
    Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.: Not supported: VSI node (233:VSI_NODE_net_pNics_firmware_ringParams)

        • In /var/run/log/vmkernel.log, we see Access denied by Access Control Policy:

    2025-07-30T05:54:00.758Z In(182) vmkernel: cpu49::########)osfs: OSFS_GetMountPointList:3748: mountPoints[0] inUse pid [    vsan], cid ###############-##################
    2025-07-30T05:54:00.917Z In(182) vmkernel: cpu5::########)VmkAccess: 106: python3: running in nsxDatapathCtrsDom(81): socket = /var/run/nscd/socket (unix_stream_socket_connect): Access denied by vmkernel access control policy
    2025-07-30T05:54:00.917Z In(182) vmkernel: cpu5::########)VmkAccess: 106: python3: running in nsxDatapathCtrsDom(81): socket = /var/run/nscd/socket (unix_stream_socket_connect): Access denied by vmkernel access control policy
    2025-07-30T05:54:00.921Z In(182) vmkernel: cpu5:########)VmkAccess: 106: python3: running in nsxDatapathCtrsDom(81): socket = /dev/log (unix_dgram_socket_connect): Access denied by vmkernel access control policy 

        • This is seen when ens driver is managing the physical adapters parameter changes.

        • Run following commands to change the ring buffer values:

    --- For receive packets:

    # nsxdp-cli ens uplink ring set -r 8192 -n vmnicX

    # net-dvs --persist


    --- For transmit packets:

    # nsxdp-cli ens uplink ring set -t 8192 -n vmnicX

    # net-dvs --persist

    Note : Before changing the ring buffer values, please reset the counters (by rebooting the ESXi host) or note down the 'Receive missed errors' values and observe for any increments

    ]


  2. If updating the ring buffer size does not correct this issue, then the next step is to enable flow control on the physical switch port that is connecting to the Mellanox pNIC.

  3. Furthermore, the nmlx5 driver provides a driver module parameter “dropless_rq”.

 

    • It may help with delay processing (by temporary buffering) of incoming packets by the hardware and also allowing flow control mechanisms (pause) in NIC firmware to handle congestion between NIC port and physical switch port thus avoiding packet loss.

    • If no existing driver parameters set: esxcli system module parameters set -m nmlx5_core -p dropless_rq=1

    • If there are existing driver parameters already set: esxcli system module parameters set -m nmlx5_core -a dropless_rq=1

    • A reboot of the host would be required. 
    • In order to activate it, ensure that Flow Control feature is also enabled Configuring Flow Control on VMware ESXi and VMware ESX

 

Note:contact the hardware vendor to validate the commands above before issuing them. The above commands are provided as reference only. 





Additional Information

Receive Missed Errors detected on Mellanox pNICs