During vMotion the vmnic in use by the vmkernel adapter changes mid vMotion
search cancel

During vMotion the vmnic in use by the vmkernel adapter changes mid vMotion

book

Article ID: 429235

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

When this issue occurs the following behavior may be seen:

  • The physical switch reports that one of the physical switchports is down.
  • Via the vSphere UI, the hosts may report that redundancy has been lost during vMotion activities.
  • The vMotion may fail after one or both of the first two items mentioned above (however this can still occur even if the vMotion is successful).
  • Reviewing live esxtop data during the vMotion will show that the vmkernel (either a specified vMotion vmkernel or the host management vmkernel depending on the environment configurations) is currently associated with a specific vmnic, but will then show a different vmnic in use.
    • This can be done by logging into the ESXi host via putty, running the command esxtop > enter > letter "n". In the example below, the vmkernel adapter vmk2 is responsible for vMotion and is currently using vmnic3:


Environment

VMware vSphere ESXi

Cause

This issue occurs due to a firmware-driven NIC reset triggered by the driver's detection of a transmit (TX) hang. High-throughput operations, such as vMotion, can expose timing sensitivities or buffer management defects in specific NIC driver/firmware combinations, leading to a temporary hardware stall that forces a reset.

Resolution

Confirmation of the issue can be found in the ESXi hosts by reviewing the hosts vmkernel log to confirm if there are any TX hangs reported by the NIC driver. Please note the below is an example and log outputs may be different.

  1. Log into the ESXi host that is experiencing the issue via a putty session
  2. Run the command: cd /var/log and hit enter to access the directory
  3. Run the command: cat vmkernel.log | less
  4. After accessing the vmkernel.log, type - i to ignore case sensitivity
  5. To search for TX hangs, type /tx hang and hit enter to verify if there are any reported TX hang issues. If there are, the output will look similar to the example below (please note that the driver type and PCI identifier, in this case is ixgben 0000:##:00.0, may differ from case to case based on the hardware in use):

As the NIC driver is the device reporting the TX  hang issues, it is recommended to work with the driver/firmware vendor to investigate the issue further. This recommendation is due to the NIC driver not being within the VMware by Broadcom supportability. 

It is also recommended that the environment uses the most up to date versions for driver/firmware from the compatibility guide, which contains vendor tested versions. 

For more details on checking driver/firmware versions, please refer to Determining Network/Storage firmware and driver version in ESXi and VMware by Broadcom Compatibility Guide for more details.