vMotion operations stall indefinitely or progress very slowly
search cancel

vMotion operations stall indefinitely or progress very slowly

book

Article ID: 411623

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • vMotion operations may stall at 20% indefinitely, or progress slowly and timeout.
  • The vMotion may sometimes eventually succeed, but not consistently.
  • The behavior may only be seen in one direction (e.g. a vMotion operation to a given host fails, but vMotion from the same host succeeds).
  • The issue may only be seen on one host.
  • Ping testing to or from the affected host's vMotion adapter succeed with little or no packet loss reported.
  • There may be blinking lights on the server/switch for the NIC connectivity with respect to the vMotion NIC.

Environment

VMware vSphere

Cause

  • Failing or degraded NIC cable causing packet loss that is low enough to be undetected with normal ping testing, however frequent enough that the vMotion operation cannot succeed, as vMotion tasks require 0% packet loss.
  • Since fiber cables have separate transmit and receive lines, only one may be experiencing the degradation hence why the vMotion tasks only fail in one direction.

Resolution

  • Identify the failing hardware in the network path and repair or replace it accordingly.

NOTE: To see what physical NIC (e.g. vmnic2) is currently being used by the vMotion vmkernel adapter (e.g. vmk1) by opening an SSH session to the host in question and running "esxtop" followed by pressing "n" ("type "q" to exit).

  • To see the private NIC stats to determine if there are any failures reported on the NIC vMotion is using with the below command:

usr/lib/vmware/vm-support/bin/nicinfo.sh | less

NOTE: The above output is provided and recorded by the NIC driver that VMware does not manage, therefore any errors seen in the above will need to be investigated further with the hardware vendor. See Troubleshooting NIC errors and other network traffic faults in ESXi for more information.

Additional Information

The specific components may be able to be isolated by moving the vMotion traffic to another physical NIC, by swapping cables to the NIC used by vMotion, replacing the SFP, etc., and see if the issue follows the NIC, the cables, or something else.