1. Performing LWD-based snapshot sync fails
2. Unable to svMotion some VMs, the virtual machine freezes and becomes unresponsive and the task fails at 40%
3. Unable to backup VMs using Dell PPDM (PowerProtect Data Manager)Task Name : Perform LWD-based snapshot syncStatus : Cannot complete the operation. See the event log for details. Failed to transport snapshot dataInitiator : com.vmware.dpTarget : VMware-VM Server : vCenter.broadcom.local Error stack: Failed to transport snapshot data
VMware.log:
2025-02-27T13:54:07.627Z In(05) worker-2653543 1381770e LWD: Preparing live migration type=SvMotion on VMName_VM
2025-02-27T13:54:07.627Z In(05) worker-2653543 1381770e LWD: Preparing live migration type=SvMotion on VMName_VM
2025-02-27T13:54:07.638Z In(05) vmx - SVMotion: Enter Phase 1
2025-02-27T13:54:07.729Z In(05) worker-2653550 - SVMotion: Enter Phase 2
2025-02-27T13:54:07.730Z In(05) worker-2653550 - SVMotionDiskGetCreateExtParams: not using a storage policy to create disk '/vmfs/volumes/5e6136e8-########-####-a0369f19c094/Datastore/VMName_VM.vmdk'
2025-02-27T13:54:09.356Z In(05) worker-2653550 7283fbbf SVMotionDiskGetCreateExtParams: not using a storage policy to create disk '/vmfs/volumes/5e6136e8-########-####-a0369f19c094/Datastore/VMName_VM.vmdk'
2025-02-27T13:54:10.994Z In(05) worker-2653550 7283fbbf SVMotion: Enter Phase 3
2025-02-27T13:54:11.169Z In(05) worker-2653550 7283fbbf SVMotionLocalDiskQueryInfo: Got block size 1048576 for filesystem VMFS.
2025-02-27T13:54:11.348Z In(05) worker-2653550 7283fbbf SVMotionLocalDiskQueryInfo: Got block size 1048576 for filesystem VMFS.
2025-02-27T13:54:11.348Z In(05) worker-2653550 7283fbbf SVMotion: Enter Phase 4
2025-02-27T13:54:11.450Z In(05) worker-2653550 7283fbbf SVMotion: Enter Phase 5
2025-02-27T13:54:11.450Z In(05) worker-2653550 7283fbbf SVMotion: Enter Phase 6
2025-02-27T13:54:11.463Z In(05) worker-2653543 7283fbbf SVMotion: Enter Phase 7
2025-02-27T13:54:11.468Z In(05) worker-2653547 7283fbbf SVMotion: Enter Phase 8
2025-02-27T13:59:40.292Z In(05) vmx - SVMotion: scsi0:1: Disk copy completed for total 245760 MB at 765326 kB/s.
2025-02-27T14:02:40.375Z Wa(03) vmx - SVMotion: scsi0:0: Disk transfer rate slow: 0 kB/s over the last 10.01 seconds, copied total 62080 MB at 353005 kB/s.
2025-02-27T14:03:03.036Z In(05) vmx 627d3af4 SVMotion: Enter Phase 12
2025-02-27T14:03:03.036Z In(05) vmx 627d3af4 SVMotion_Cleanup: Scheduling cleanup thread.
2025-02-27T14:03:03.036Z Wa(03) worker-2653547 7283fbbf SVMotionMirroredModeThreadDiskCopy: Found internal error when woken up on diskCopySemaphore. Aborting storage vmotion.
2025-02-27T14:03:03.036Z In(05) worker-2653543 627d3af4 SVMotionCleanupThread: Waiting for SVMotion Bitmap thread to complete.
2025-02-27T14:03:03.036Z In(05) worker-2653543 627d3af4 SVMotionCleanupThread: Waiting for SVMotion thread to complete.
2025-02-27T14:03:03.036Z Wa(03) worker-2653547 7283fbbf SVMotionCopyThread: disk copy failed. Canceling Storage vMotion.
2025-02-27T14:03:03.036Z In(05) worker-2653547 7283fbbf SVMotionCopyThread: Waiting for SVMotion Bitmap thread to complete before issuing a stun during migration failure cleanup.
2025-02-27T14:03:03.037Z In(05) worker-2653547 7283fbbf SVMotion: FailureCleanup thread completes.
2025-02-27T14:03:03.037Z In(05) worker-2653543 7283fbbf SVMotion: Worker thread performing SVMotionCopyThreadDone exited.
2025-02-27T14:03:03.037Z In(05) worker-2653543 - SVMotionCleanupThread: Waiting for the cleanup semaphore to be signaled so that it is safe for the cleanup thread to proceed.
2025-02-27T14:03:05.043Z In(05) vmx 7283fbbf [msg.svmotion.fail.internal] A fatal internal error occurred. See the virtual machine's log for more details.
2025-02-27T14:03:05.043Z In(05) vmx 7283fbbf [msg.svmotion.disk.copyphase.failed] Failed to copy one or more disks.
Read IOs are aborted by VMkernel.
2025-03-07T03:03:26.811Z Er(02) Upcall-1cd224cc 4f2e9509 LWD: Failed to read extent range [248372,248372], offset range [65109229568, 262144], from disk 4095126750 (capacity 214748364800), readTxn 142. Error: 15: IO was aborted
2025-03-07T03:03:30.369Z In(05) worker-3581520 35be92ef-92f0 LWD: Handling FinishFullSync message for disk '87a63c76-####-####-####-0867148aeab2', sync 61181b23-aa75-405e-5c39-141fe3ed543e success 'false'
VMkernel.log:
Write commands fails with busy errors from the target
2025-03-07T03:03:15.541Z In(182) vmkernel: cpu13:2098156)NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x28 (0x45b9bfc576c0, 3552458) to dev "naa.################################" on path "vmhba2:C0:T2:L114" Failed:
2025-03-07T03:03:15.541Z In(182) vmkernel: cpu13:2098156)NMP: nmp_ThrottleLogForDevice:3898: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0x3. Act:FAILOVER. cmdId.initiator=0x431cbe16c930 CmdSN 0x431cd6c042d0
2025-03-07T03:03:15.541Z Wa(180) vmkwarning: cpu13:2098156)WARNING: NMP: nmp_DeviceRetryCommand:130: Device "naa.################################": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2025-03-07T03:03:16.528Z Wa(180) vmkwarning: cpu1:2097944)WARNING: NMP: nmpDeviceAttemptFailover:644: Retry world failover device "naa.################################" - issuing command 0x45b9bfc576c0
| Sense Key | [0x2] | NOT READY | |
| Additional Sense Data | 04/03 | LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED | |
| OP Code | 0x28 | READ(10) | |
The command is retried multiple times but the target returns busy
2025-03-07T03:03:20.080Z Wa(180) vmkwarning: cpu18:2098156)WARNING: NMP: nmpCompleteRetryForPath:356: Retry cmd 0x28 (0x45b9bfc576c0) to dev "naa.################################" failed on path "vmhba2:C0:T2:L114" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0x3.
2025-03-07T03:03:20.080Z Wa(180) vmkwarning: cpu18:2098156)WARNING: NMP: nmpCompleteRetryForPath:391: Logical device "naa.################################": awaiting fast path state update before retrying failed command again...
Eventually the command is aborted as part of virt reset
2025-03-07T03:03:26.811Z Wa(180) vmkwarning: cpu0:2098156)WARNING: NMP: nmpCompleteRetryForPath:356: Retry cmd 0x28 (0x45b9bfc576c0) to dev "naa.################################" failed on path "vmhba2:C0:T2:L114" H:0x8 D:0x0 P:0x0 .
2025-03-07T03:03:26.811Z Wa(180) vmkwarning: cpu0:2098156)WARNING: NMP: nmpCompleteRetryForPath:443: Retry world restored device "naa.################################" - no more commands to retry
2025-03-07T03:03:26.811Z Wa(180) vmkwarning: cpu0:2098156)WARNING: NMP: nmpCompleteRetryForPath:457: NMP device "naa.################################": requested fast path state update...
2025-03-07T03:03:26.811Z In(182) vmkernel: cpu0:2098156)ScsiDeviceIO: 4591: Cmd(0x45b9bfc576c0) 0x28, cmdId.initiator=0x431cbe16c930 CmdSN 0x431cd6c042d0 from world 3552458 to dev "naa.################################" failed H:0x8 D:0x0 P:0x0 Cancelled from driver
2025-03-07T03:03:26.811Z In(182) vmkernel: cpu0:2098156)layer
| Host Status | [0x8] | RESET | This status is returned when the HBA driver has aborted the I/O. It can also occur if the HBA does a reset of the target. |
2025-03-01T02:44:00.779Z In(182) vmkernel: cpu0:2097235)ScsiDeviceIO: 4656: Cmd(0x45d9aa86ba40) 0x89, cmdId.initiator=0x4309476f3a80 CmdSN 0x8b4fb from world 2097224 to dev "naa..################################" failed H:0x5 D:0x0 P:0x0 Cancelled from device layer.
2025-03-01T02:44:00.779Z In(182) vmkernel: cpu0:2097235)Cmd count Active:1 Queued:24
2025-03-01T02:47:21.784Z In(182) vmkernel: cpu1:2097237)ScsiDeviceIO: 4656: Cmd(0x45d9f5d5a900) 0x89, cmdId.initiator=0x4309476f3a80 CmdSN 0x8b701 from world 2097224 to dev "naa..################################" failed H:0x5 D:0x0 P:0x0 Cancelled from device layer.
2025-03-01T02:47:21.784Z In(182) vmkernel: cpu1:2097237)Cmd count Active:1 Queued:20
2025-03-03T02:53:27.141Z In(182) vmkernel: cpu0:2097236)ScsiDeviceIO: 4656: Cmd(0x45d9d32c9180) 0x89, cmdId.initiator=0x4309476f3a80 CmdSN 0xc7593 from world 2097224 to dev "naa..################################"failed H:0x5 D:0x0 P:0x0 Cancelled from device layer.
2025-03-03T02:53:27.141Z In(182) vmkernel: cpu0:2097236)Cmd count Active:1 Queued:20
| Host Status | [0x5] | ABORT | This status is returned if the driver has to abort commands in-flight to the target. This can occur due to a command timeout or parity error in the frame. |
| OP Code | 0x89 | COMPARE AND WRITE |
VMware vSphere 7.x
VMware vSphere 8.x
Reviewing the errors recorded in the logs, it seems that the Storage array is experiencing difficulties in processing ATS and I/O commands, which is resulting in the aforementioned issues.
Please reach out to your Storage vendor to raise a case for further investigation and analysis.