Virtual machines experienced network interruption and subsequent vMotion failures.
Log analysis reveals significant I/O latency and physical link instability on the source host.
Symptoms & Error Messages:
vMotion Failure: Migration failed: Timeout (0xbad0021) and Migration considered a failure by the VMX.
Storage Latency: ScsiDeviceIO: 1596: Device <REDACTED_NAA> performance has deteriorated. I/O latency increased from 1280us to 131801us.
Link Flapping: nmlx5_core: vmnic0: Changing link status from DOWN to UP.
Hardware Alert: SEL Message: Assert + Temperature Upper Critical going high.
Guest Impact: GuestRpcSendTimedOut: message to toolbox timed out.
VMware ESXi 7.0 / 8.0
The issue is caused by a combination of physical layer instability (SFP+/Fiber faults and thermal alerts) and a non-standardized iSCSI network configuration (10G/25G mixed speeds and MTU mismatches).
Thermal Inspection: Verify the physical server cooling and airflow to resolve the "Temperature Upper Critical" state recorded in the BMC/SEL.
Network Standardization:
Ensure all physical NICs are unified at a 25Gbps line rate to prevent throughput imbalances within the iSCSI initiator group.
Perform a global audit of MTU 9000 settings on the vSwitch, VMkernel ports, physical switches, and storage processors to resolve Len Err (Length Errors).