Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized
search cancel

Some VMs on an ESXi 6.5 host may lose network connectivity after vMotion or startup when the netqueue Tx queues for the vmnic are not properly initialized

book

Article ID: 317696

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • After vMotion or VM power on, the VM loses network connectivity
  • Other VMs on the host may not be affected and may continue to function normally
  • The number of netqueue Tx queues reported for the physical NIC that the VM is using does not match the number of queues that have been activated:
vsish -e get /net/pNics/vmnicX/txqueues/info
tx queues info {
   # active queues:2
   default queue id:0
}

vsish -e ls /net/pNics/vmnicX/txqueues/queues/
0/

Note: Substitute vmnicX with the appropriate vmnic name.  Notice in the above example, there are 2 Tx queues reported but only one queue, queue 0, is showing in the list of activated queues.

  • Linux VMs may report the following entries in the messages log within the guest OS:
Month DD HH:MM:SS hostname kernel: WARNING: at net/sched/sch_generic.c:265 dev_watchdog+0x26b/0x280() (Not tainted)
Month DD HH:MM:SS hostname kernel: Hardware name: VMware Virtual Platform
Month DD HH:MM:SS hostname kernel: NETDEV WATCHDOG: eth0 (vmxnet3): transmit queue 2 timed out
Month DD HH:MM:SS hostname kernel: Modules linked in: iptable_filter ip_tables ipv6 microcode vmware_balloon sg i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 vmw_pvscsi p
ata_acpi ata_generic ata_piix vmwgfx ttm drm_kms_helper drm i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: vmci]
Month DD HH:MM:SS hostname kernel: Pid: 0, comm: swapper Not tainted 2.6.32-642.4.2.el6.x86_64 #1
Month DD HH:MM:SS hostname kernel: Call Trace:
Month DD HH:MM:SS hostname kernel: <IRQ> [<ffffffff8107c6f1>] ? warn_slowpath_common+0x91/0xe0
Month DD HH:MM:SS hostname kernel: [<ffffffff8107c7f6>] ? warn_slowpath_fmt+0x46/0x60
Month DD HH:MM:SS hostname kernel: [<ffffffff8149bd0b>] ? dev_watchdog+0x26b/0x280
Month DD HH:MM:SS hostname kernel: [<ffffffff8108ec75>] ? internal_add_timer+0xb5/0x110
Month DD HH:MM:SS hostname kernel: [<ffffffff8149baa0>] ? dev_watchdog+0x0/0x280
Month DD HH:MM:SS hostname kernel: [<ffffffff8108f907>] ? run_timer_softirq+0x197/0x340
Month DD HH:MM:SS hostname kernel: [<ffffffff8108f166>] ? update_process_times+0x76/0x90
Month DD HH:MM:SS hostname kernel: [<ffffffff8103e577>] ? native_apic_msr_write+0x37/0x40
Month DD HH:MM:SS hostname kernel: [<ffffffff81085275>] ? __do_softirq+0xe5/0x230
Month DD HH:MM:SS hostname kernel: [<ffffffff8100c38c>] ? call_softirq+0x1c/0x30
Month DD HH:MM:SS hostname kernel: [<ffffffff8100fca5>] ? do_softirq+0x65/0xa0
Month DD HH:MM:SS hostname kernel: [<ffffffff81085105>] ? irq_exit+0x85/0x90
Month DD HH:MM:SS hostname kernel: [<ffffffff81552cca>] ? smp_apic_timer_interrupt+0x4a/0x60
Month DD HH:MM:SS hostname kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Month DD HH:MM:SS hostname kernel: <EOI> [<ffffffff8104601b>] ? native_safe_halt+0xb/0x10
Month DD HH:MM:SS hostname kernel: [<ffffffff8101696d>] ? default_idle+0x4d/0xb0
Month DD HH:MM:SS hostname kernel: [<ffffffff81009fe6>] ? cpu_idle+0xb6/0x110
Month DD HH:MM:SS hostname kernel: [<ffffffff8152f17a>] ? rest_init+0x7a/0x80
Month DD HH:MM:SS hostname kernel: [<ffffffff81c3b127>] ? start_kernel+0x429/0x436
Month DD HH:MM:SS hostname kernel: [<ffffffff81c3a33a>] ? x86_64_start_reservations+0x125/0x129
Month DD HH:MM:SS hostname kernel: [<ffffffff81c3a453>] ? x86_64_start_kernel+0x115/0x124
Month DD HH:MM:SS hostname kernel: ---[ end trace 7266a13370d01d2d ]---
  • Linux VMs may also report the following entries in the messages log within the guest OS:
Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: tx hang
Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: resetting
Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: tx hang
Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: intr type 3, mode 0, 9 vectors allocated
Month DD HH:MM:SS hostname kernel: vmxnet3 0000:0b:00.0: eth0: NIC Link is Up 10000 Mbps


Cause

The issue occurs due to a known issue with ESXi's netqueue loadbalancer feature that can cause some of the adapter queues to fail to be initialized properly.  This can lead to a loss of network connectivity when a VM is assigned to one of these queues by the network load balancer.

Note: Network connectivity loss can have many different possible root causes.  You are only experiencing this issue in this article if there is a difference between the number of Tx queues that are reported and the number of Tx queues that are activated as described in the Symptoms section.

Resolution

This issue is resolved in ESXi 6.5 P04 (ESXi650-201912002).

ESXi 6.7 is not affected by this issue.

Workaround:
To workaround this issue, perform the following steps:

1. Disable and re-enable the NIC at the ESXi level:
localcli network nic down -n vmnicX
localcli network nic up -n vmnicX
2. Verify that the number of queues reported:
vsish -e get /net/pNics/vmnicX/txqueues/info
3. Verify that the number of queues initialized matches the number of queues reported in #2:
vsish -e ls /net/pNics/vmnicX/txqueues/queues/
Note: Ensure that there are redundant NICs on the vSwitch prior to attempting this change.

Additional Information

Please notice that the amount of queues in '/net/pNics/vmnicX/txqueues/queues/' will always be lower (1) than in '/net/pNics/vmnicX/txqueues/info' IF the vmnic is in 'Down' state, e.g. no cable plugged in.