VM cloning / Storage vMotion or VM Backup task fails with errors either "Failed to copy one or more disks" or "A fatal internal error occurred"

Products

VMware vSphere ESX 6.x VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

Storage vMotion of a virtual machine fails

You see the errors similar to:

Failed to copy one or more disks.
Could not complete network copy for the file "<Path of the VMDK>

Or

A fatal internal error occurred. See the virtual machine's log for more details. 
YYYY-MM-DD T HH:MMZ Failed waiting for data. Error 195887107. Not found. 
YYYY-MM-DD T HH:MMZ Failed to copy source (/vmfs/volumes/[DATASTORE_UUID]/[VM_NAME]/[DISK_NAME].vmdk) 
to destination (/vmfs/volumes/[DATASTORE_UUID]/[VM_NAME]/[DISK_NAME].vmdk): 
I/O error. Failed to copy one or more disks.

In the ESXi - /var/log/vpxa.log file on the ESXi host, you see the error:

A fatal internal error occurred. See the virtual machine's log for more details. Failed to copy one or more disks.

In the ESXi - /vmfs/volumes/datastore/vm_name/vmware.log file on the ESXi host, you see entries similar to:

YYYY-MM-DD TIME.752Z| vmx| W110: SvMotion: scsi0:0: Failed to copy disk: I/O error
YYYY-MM-DD TIME.752Z| Worker#1| W110: SVMotionMirroredModeThreadDiskCopy: Found internal error when woken up on diskCopySemaphore. Aborting storage vMotion.

In the ESX - /var/log/vmkernel.log file of the ESXi host, you see entries similar to:

YYYY-MM-DD TIME.641Z cpu4:33440)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x28 (0x412e40448bc0, 7812129) to dev "<naa ID>" on path "vmhba2:CX:TX:LX" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0. Act:NONE
YYYY-MM-DD TIME.641Z cpu4:33440)ScsiDeviceIO: 2337: Cmd(0x412e40448bc0) 0x28, CmdSN 0x5671fa from world 7812129 to dev "<naa ID>" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0.

YYYY-MM-DD T HH:MMZ cpu59:12873043)ScsiDeviceIO: 2652: Cmd(0x43be4f266680) 0x28, CmdSN 0xb062f5 from world 11097888 to dev <naa ID> failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x0 0x0.

YYYY-MM-DD T HH:MMZ In(182) vmkernel: cpu78:[CPU_ID] opID=[OP_ID] BC: 608: read from [VM_DISK_NAME].vmdk 
([DATASTORE_INFO]) 131072 bytes failed: I/O error

YYYY-MM-DD T HH:MMZ In(182) vmkernel: cpu103:[CPU_ID]) ScsiDeviceIO: 4686: Cmd([CMD_ID]) 0x88, CmdSN [CMD_SN] from world [WORLD_ID]
H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 Medium Error, LBA:[LBA_INFO]

Backup job fails for the VM

         Veritas client reports as below:

MMM DD, YYYY, hh:mm:ss - Info bptm (pid=#####) EXITING with status 25 <----------
MMM DD, YYYY, hh:mm:ss - Info <backup_master> (pid==#####)) StorageServer=PureDisk:<backup_master>; Report=PDDO Stats (multi-threaded stream used) for (<backup_master>:volume): scanned: ######### KB, CR sent: ######## KB, CR sent over FC: 0 KB, dedup: ##%, cache disabled, where dedup space saving:##%, compression space saving:##%, new transferred data unencrypted
MMM DD, YYYY, hh:mm:ss - Error bpbrm (pid==#####)) could not send server status message to client
MMM DD, YYYY, hh:mm:ss - Info bpbkar32 (pid==#####)) done. status: 11: system call failed
MMM DD, YYYY, hh:mm:ss - Error nbpem (pid==#####)) backup of client <FQDN of the client> exited with status 11 (system call failed)

On the source ESXi host, in the /var/log/vmkernel.log file, these error messages can be seen

YYYY-MM-DD T HH:MMZ cpu5:2098305)WARNING: SVM: 3769: scsi0:1 Failed to issue SvMotion async read IO, exhausted 10 retries: 70000, I/O error
YYYY-MM-DD T HH:MMZ cpu5:2098305)WARNING: SVM: 3769: scsi0:1 Failed to issue SvMotion async read IO, exhausted 10 retries: 70000, I/O error
YYYY-MM-DD T HH:MMZ cpu5:2098305)WARNING: SVM: 3769: scsi0:1 Failed to issue SvMotion async read IO, exhausted 10 retries: 70000, I/O error
YYYY-MM-DD T HH:MMZ cpu3:37819971)WARNING: SVM: 4237: scsi0:1 Failed to read blocks from disk: I/O error
YYYY-MM-DD T HH:MMZ cpu5:2098305)WARNING: SVM: 3769: scsi0:1 Failed to issue SvMotion async read IO, exhausted 10 retries: 70000, I/O error

Environment

VMware vSphere ESXi host 8.X
VMware vSphere ESXi host 7.X
VMware vSphere ESXi host 6.X

Cause

This issue occurs when the ESXi host is unable to read data from the datastore on which the virtual machine is running during storage vMotion / backup jobs.

Resolution

This behavior is expected when a disk has failed and is returning a medium error (0x3).

To confirm this, identify the failed SCSI sense code in the vmkernel.log. For example: H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x0 0x0

Once identified, use the sense code decoder available at the link below to interpret the error and determine the underlying cause.

H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x0 0x0

Sense Key

[0x3] | MEDIUM ERROR

Additional Sense Data
11/00 | UNRECOVERED READ ERROR

You may also see H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x0 0x0 in logs
Sense Key
[0x4] HARDWARE ERROR

Or H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 in logs
Sense Key
[0x5] ILLEGAL REQUEST
Additional Sense Data
25/00 LOGICAL UNIT NOT SUPPORTED

The error codes above originate from the underlying storage hardware and indicate disk-related errors. To address this issue, it is recommended to contact the storage vendor to investigate and resolve the disk errors.

For additional details on SCSI sense codes within a VMware environment, refer to “Interpreting SCSI sense codes in VMware ESXi.”
You can also decode specific sense codes using the following resource: https://www.virten.net/vmware/esxi-scsi-sense-code-decoder/

Workaround:
Note: If the storage vendor is unable to resolve the disk errors, the only available workaround is to recreate or restore the affected virtual machines from a backup.