VM Connectivity Loss and vMotion Failure due to iSCSI Path Degradation and Hardware Latency
search cancel

VM Connectivity Loss and vMotion Failure due to iSCSI Path Degradation and Hardware Latency

book

Article ID: 435875

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Virtual machines experienced network interruption and subsequent vMotion failures.
Log analysis reveals significant I/O latency and physical link instability on the source host.

Symptoms & Error Messages:

  • vMotion Failure: Migration failed: Timeout (0xbad0021) and Migration considered a failure by the VMX.

  • Storage Latency: ScsiDeviceIO: 1596: Device <REDACTED_NAA> performance has deteriorated. I/O latency increased from 1280us to 131801us.

  • Link Flapping: nmlx5_core: vmnic0: Changing link status from DOWN to UP.

  • Hardware Alert: SEL Message: Assert + Temperature Upper Critical going high.

  • Guest Impact: GuestRpcSendTimedOut: message to toolbox timed out.

Environment

VMware ESXi 7.0 / 8.0

Cause

The issue is caused by a combination of physical layer instability (SFP+/Fiber faults and thermal alerts) and a non-standardized iSCSI network configuration (10G/25G mixed speeds and MTU mismatches).

Resolution

 

  • Physical Hardware Replacement: Replace the SFP+ transceivers and fiber optic cables for the problematic vmnic#.
  • Thermal Inspection: Verify the physical server cooling and airflow to resolve the "Temperature Upper Critical" state recorded in the BMC/SEL.

  • Network Standardization:

    • Ensure all physical NICs are unified at a 25Gbps line rate to prevent throughput imbalances within the iSCSI initiator group.

    • Perform a global audit of MTU 9000 settings on the vSwitch, VMkernel ports, physical switches, and storage processors to resolve Len Err (Length Errors).