Bad network performance is observed with latency between VMs located on different ESXi hosts
search cancel

Bad network performance is observed with latency between VMs located on different ESXi hosts

book

Article ID: 413864

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware Telco Cloud Infrastructure VMware Telco Cloud Platform

Issue/Introduction

Below symptoms are observed on ESXi hosts.

  • False hang detected in driver logs of vmkernel as below:
xxxx-xx-xxTxx:xx:xx.xxxZ cpu24:xxxxxxx)ixgben: ixgben_CheckTxHang:xxxx: vmnicX: false hang detected on TX queue 2
  • Intel physical NICs (ixgben) are configured on physical NICs of ESXi host.

Environment

ESXi 7.x
ESXi 8.x

Cause

Root cause of the issue is that there is a bug in Intel’s ixgben driver for queue utilization triggering TX false hang in vmkernel logs. 

"TX false hang" indicates a possible interrupt loss which may impact the traffic performance through physical NICs.
This happens if the physical NICs carrying traffic have a higher number of TX queues than RX queues.

 

Resolution

Intel fixed the issue in the async driver version ixgben-1.22.1.0, and is available for ESXi 8.0 +

WA to mitigate the issue in ESXi 7.0 or for driver version less than ixgben-1.22.1.0 is as below, 

  1. Verify the current ixgben parameters set with this command:
    $ esxcli system module parameters list -m ixgben

    Take note of any parameter already set in order to verify them after any change.

  2. Enable DevRSS on the ESXi hosts, which will enable more RX queues that could mitigate any possible bottleneck on the driver RX side:
    $ esxcli system module parameters set -p DevRSS=<value_list> -m ixgben
    $ reboot

    or

    $ esxcli system module parameters set -a -p DevRSS=<value_list> -m ixgben
    $ reboot

"-a" to append the new parameters and leave the other parameters set.

The "value_list" for DevRSS is typically a comma-separated list of values, where each value corresponds to a specific physical NIC handled by that module.

  • A value of 1 means Enable RSS for that specific physical port.

  • A value of 0 means Disable RSS for that specific physical port.

For ex., esxcli system module parameters set -p DevRSS=1,1,1,1 -m ixgben enables DevRSS on all 4 physical NICs of ESXi host.

NOTE:
DevRSS conflicts with RSS and DRSS, remove any RSS and DRSS configurations.

Additional Information

From the output of the command vsish -e get /net/pNics/vmnicX/stats on affected ESXi hosts, check the RX queues and TX Queues for which the  "rxPkts" and "txPkts" are incrementing respectively. Check if the number of TX queues being used are more than RX queues.