A virtual machine crashes unexpectedly and is subsequently restarted by vSphere HA during backup operations. This issue typically involves Lightweight Delta (LWD) snapshot synchronization and is triggered by storage latency that prevents the I/O filter from quiescing or buffering operations. These conditions lead to a mandatory kernel panic to ensure data integrity and prevent disk corruption.
Key indicators include SES write failures and SCSI performance deterioration warnings in host logs. The /var/run/log/fdm.log contains entries similar to:[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2102736]: [Originator@#### sub=FDM opID=WorkQueue-########] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/########/########/########.vmx host=host-###### tag=host-######:-#########:#
vmware.log shows LWD failures and a mandatory kernel panic:PANIC: LWD: Failure to quiesce IOs on disk... after failure to read or write snapshot data
[YYYY-MM-DDTHH:MM:SS] Er(02) worker-2328848 293b2adb LWD: Failed to write extent range [62458943,62458954] for the SES /vmfs/volumes/########/########/########.ses. Error 15: IO was aborted[YYYY-MM-DDTHH:MM:SS] Wa(03) worker-2328848 293b2adb LWD: Failure writing to SES, snapshot must be canceled, disk 6187408BF0: IO was aborted (15)[YYYY-MM-DDTHH:MM:SS] Wa(03) worker-2328848 293b2adb LWD: The disk 6187408BF0 has failed SES writes, additional chunk processing is not possible
[YYYY-MM-DDTHH:MM:SS] Er(02) worker-2133979 5ea2d801 LWD: Failed to read extent range [31294929,31294960], offset range [256368058368, 262144], from disk 6187408BF0 (capacity 547608330240), readTxn 10947. Error: 15: IO was aborted[YYYY-MM-DDTHH:MM:SS] Wa(03) worker-2133979 5ea2d801 LWD: LwdIoTracker_CancelAllIOs: '11' IOs did not complete in the stipulated time for IoTracker 61ED0654A0, disk 6187408BF0......
[YYYY-MM-DDTHH:MM:SS] Er(02) Upcall-c2e97 eeceb37e LWD: Failed to read extent range [31294684,31294707], offset range [256366051328, 196608], from disk 6187408BF0 (capacity 547608330240), readTxn 10946. Error: 18: Busy[YYYY-MM-DDTHH:MM:SS] Wa(03) worker-2134172 eeceb37e LWD: Resetting SES IO status from 18 (Busy) to BUSY as IOs are not permitted, disk 6187408BF0
/var/run/log/vmkernel.log: [YYYY-MM-DDTHH:MM:SS] In(182) vmkernel: cpu72:5586470)ScsiDeviceIO: 5789: Restricting cmd 0x28 (4096 bytes) from WID 5586470 to quiesced dev naa.########## (vmkCmd=0x45### cmdId: 0x430###, 0x6####)[YYYY-MM-DDTHH:MM:SS] Wa(180) vmkwarning: cpu84:5585612)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.##########" state in doubt; requested fast path state update...[YYYY-MM-DDTHH:MM:SS] In(182) vmkernel: cpu2:5585613)ScsiDeviceIO: 4686: Cmd(0x45da691b61c0) 0xfe, CmdSN 0x6xxx from world 210#### to dev "naa.##########" failed H:0x8 D:0x0 P:0x0 [YYYY-MM-DDTHH:MM:SS] In(182) vmkernel: cpu84:5585612)ScsiDeviceIO: 4686: Cmd(0x45da60622e80) 0xfe, CmdSN 0x6xxx from world 210#### to dev "naa.##########" failed H:0x8 D:0x0 P:0x0 [YYYY-MM-DDTHH:MM:SS] Wa(180) vmkwarning: cpu72:2099107)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.##########" state in doubt; requested fast path state update...VMware vCenter Server 8.0.x
VMware ESXi 8.0.x
The issue is caused by high VM I/O exhaustion and storage latency during Lightweight Delta (LWD) backup operations. When latency exceeds the tolerance of the LWD filter (e.g., spikes reaching 5000ms+), the filter cannot successfully quiesce I/O. The VMX process initiates a mandatory kernel panic as a safety mechanism to prevent potential corruption of the base disk or snapshot chain.
Storage Remediation: Engage the storage hardware vendor to investigate the physical array and SAN fabric for "State in Doubt" and performance degradation events on the affected LUNs.
Backup Optimization: Consult the backup software vendor to implement I/O throttling for LWD tasks to prevent subsystem saturation.