After enabling EDP standard mode, RX missed error alarms being reported on hosts with pNIC's using the i40en driver
search cancel

After enabling EDP standard mode, RX missed error alarms being reported on hosts with pNIC's using the i40en driver

book

Article ID: 415569

calendar_today

Updated On:

Products

VMware NSX VMware vSphere ESX 8.x

Issue/Introduction

  • EDP standard mode has recently been enabled on the hosts. The hosts are using Intel pNIC's which use an i40en driver version which predates driver version 2.11.1.0.
  • Since enabling EDP you are now seeing high RX missed error alarms being reported by vSAN Skyline Health. The alarms being reported are similar to what is described in KB 312096.
  • Using the following command you can gather the pNIC packet counters for EDP standard enabled hosts:

/usr/lib/vmware/vm-support/bin/ens_info.sh > ens_info.sh.txt

  • When you grep through the file outputted from the above command, then it can be confirmed that packets are being missed:

grep -A 25 "Uplink stats for uplink" ens_info.sh.txt
Uplink stats for uplink vmnic0
Uplink Stats:
 rxPkts:             16651677
 txPkts:             26546949
 rxBytes:            3697611393
 txBytes:            6703593607
 rxErrors:           0
 txErrors:           0
 rxDrops:            0
 txDrops:            68016
 rxMulticastPkts:    2768118
 rxBroadcastPkts:    424732
 txMulticastPkts:    10822
 txBroadcastPkts:    5779
 collisions:         0
 rxLengthErrors:     0
 rxOverflowErrors:   0
 rxCRCErrors:        0
 rxFrameAlignErrors: 0
 rxFifoErrors:       0
 rxMissErrors:       5872913
 txAbortedErrors:    0
 txCarrierErrors:    0
 txFifoErrors:       0
 txHeartbeatErrors:  0
 txWindowErrors:     0

Environment

VMware NSX 4.2.X

VMware vSphere ESX 8.X U3

Cause

  • The issue appears after pNIC tuning parameters are changed by enabling EDP. Those changes are made to tune and optimise the performance of EDP. The relevant setting for this issue is that DRSS (Default Receive Side Scaling) is enabled, which makes use of the default queue on a pNIC.
  • There is an issue found with the i40en driver were it only activates 4 * DRSS queues, while the driver and ENS stack assumes more RSS queues are in use.
  • Due to this RSS queues of the default queue and interrupts are not correctly mapped. This will lead to some of the RSS queues not getting interrupts to receive packets, which can cause some packets being RX missed and dropped. 

Resolution

  • The issue is fixed in version 2.11.1.0 and later of the async i40en driver. The following KB is available for help downloading the drivers:

https://knowledge.broadcom.com/external/article?articleId=366755

Additional Information

  • Please note that the 8.0U3 inbox driver does not have this issue.