VM Consolidation Failed with the Error: Failed to copy source to destination VMDK due to a timeout.

Products

VMware vSphere ESXi

Issue/Introduction

VM Consolidation Failed with the below error,

Error: Failed to copy source to destination VMDK due to a timeout.

vmware.log

YYYY-MM-DDTHH:MM:SS.msZ In(##) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 97
YYYY-MM-DDTHH:MM:SS.msZ In(##) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 97
.
.
.
YYYY-MM-DDTHH:MM:SS.msZ In(##) vcpu-0 - ConsolidateEnd: Snapshot consolidate complete: The operation failed (36).

hostd.log

YYYY-MM-DDTHH:MM:SS.msZ In(###) Hostd[#######] [Originator@#### sub=Vimsvc.TaskManager opID=########-##-#### sid=######## user=vpxuser:VSPHERE.LOCAL\vpxd-extension-03ee1d7c-####-####-####-1b8f9c6d38e5] Task Completed : haTask--vim.vslm.host.CatalogSyncManager.queryCatalogChange-######## Status success
YYYY-MM-DDTHH:MM:SS.msZ Db(###) Hostd[#######] [Originator@#### sub=Vigor.Vmsvc.vm:/vmfs/volumes/DataStoreName/VM_Name/VM_Name.vmx] Consolidate Disks message: Failed to copy source (/vmfs/volumes/DataStoreName/VM_Name/VM_Name-000003.vmdk) to destination (/vmfs/volumes/DataStoreName/VM_Name/VM_Name-000001.vmdk): Timeout.
YYYY-MM-DDTHH:MM:SS.msZ Db(###) Hostd[#######] --> Failed to get copy progress while consolidating disks from '/vmfs/volumes/DataStoreName/VM_Name/VM_Name-000003.vmdk' to '/vmfs/volumes/DataStoreName/VM_Name/VM_Name-000001.vmdk'.
YYYY-MM-DDTHH:MM:SS.msZ Db(###) Hostd[#######] --> Consolidation failed for disk node 'scsi0:0': The operation failed.
YYYY-MM-DDTHH:MM:SS.msZ Db(###) Hostd[#######] --> An error occurred while consolidating disks: The operation failed.
YYYY-MM-DDTHH:MM:SS.msZ Db(###) Hostd[#######] -->
YYYY-MM-DDTHH:MM:SS.msZ In(###) Hostd[#######] [Originator@#### sub=Vimsvc.ha-eventmgr] Event ####### : Virtual machine VM_Name disks consolidation failed on esxi01.example.com in cluster esxi01.example.com in ha-datacenter.

The following log snippets may also appear after the error mentioned above.

YYYY-MM-DDTHH:MM:SS.msZ Er(###) Hostd[#######] [Originator@6876 sub=DiskLib opID=WorkQueue-########-##-#### sid=######## user=vpxuser:<no user>] DISKLIB-CHAIN : DiskChainDBGet: Cannot apply logicalSectorSize to partial chain.
YYYY-MM-DDTHH:MM:SS.msZ Er(###) Hostd[#######] [Originator@6876 sub=DiskLib opID=WorkQueue-########-##-#### sid=######## user=vpxuser:<no user>] DISKLIB-CHAIN : DiskChainDBGet: Cannot apply physicalSectorSize to partial chain.
YYYY-MM-DDTHH:MM:SS.msZ Er(###) Hostd[#######] [Originator@6876 sub=DiskLib opID=WorkQueue-########-##-#### sid=######## user=vpxuser:<no user>] DISKLIB-CHAIN : DiskChainDBGet: Cannot apply logicalSectorSize to partial chain.
YYYY-MM-DDTHH:MM:SS.msZ Er(###) Hostd[#######] [Originator@6876 sub=DiskLib opID=WorkQueue-########-##-#### sid=######## user=vpxuser:<no user>] DISKLIB-CHAIN : DiskChainDBGet: Cannot apply physicalSectorSize to partial chain.

Environment

VMware vSphere ESXi

Cause

Considering both VMDKs reside on the same storage, this behavior would not normally occur. However, the larger VMDK is more susceptible to the impact of high latency compared to the smaller one. Taking this into account, we recommend checking the host for storage performance and accessibility issues. Below is a sample showing the observed performance degradation.

vmkernel.log

YYYY-MM-DDTHH:MM:SS.msZ In(###) vmkernel: cpu18:2097893)ScsiDeviceIO: ####: Device naa.############################## performance has improved. I/O latency reduced from 36409 microseconds to 715356 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ Wa(###) vmkwarning: cpu25:2097890)WARNING: ScsiDeviceIO: ####: Device naa.############################## performance has deteriorated. I/O latency increased from average value of 1518 microseconds to 305484 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ Wa(###) vmkwarning: cpu6:2097893)WARNING: ScsiDeviceIO: ####: Device naa.############################## performance has deteriorated. I/O latency increased from average value of 1533 microseconds to 475323 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ In(###) vmkernel: cpu16:2097891)ScsiDeviceIO: ####: Device naa.############################## performance has improved. I/O latency reduced from 475323 microseconds to 80625 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ Wa(###) vmkwarning: cpu6:2097893)WARNING: ScsiDeviceIO: ####: Device naa.############################## performance has deteriorated. I/O latency increased from average value of 1538 microseconds to 474112 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ In(###) vmkernel: cpu30:2097893)ScsiDeviceIO: ####: Device naa.############################## performance has improved. I/O latency reduced from 474112 microseconds to 93814 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ Wa(###) vmkwarning: cpu6:2097890)WARNING: ScsiDeviceIO: ####: Device naa.############################## performance has deteriorated. I/O latency increased from average value of 1543 microseconds to 486743 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ In(###) vmkernel: cpu12:2097891)ScsiDeviceIO: ####: Device naa.############################## performance has improved. I/O latency reduced from 486743 microseconds to 87793 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ Wa(###) vmkwarning: cpu6:2097893)WARNING: ScsiDeviceIO: ####: Device naa.############################## performance has deteriorated. I/O latency increased from average value of 1583 microseconds to 490084 microseconds.
YYYY-MM-DDTHH:MM:SS.msZ In(###) vmkernel: cpu4:2097890)ScsiDeviceIO: ####: Device naa.############################## performance has improved. I/O latency reduced from 490084 microseconds to 77433 microseconds.

As observed in the above snippets, the storage latency for device naa.############################## fluctuated, increasing, recovering, and then rising again. Since the consolidation process involves intensive read and write operations, these latency variations directly impacted its successful completion.

Resolution

It is recommended to review and address the storage performance issues, take the necessary steps to improve overall performance, and then attempt the VM consolidation again.

For further assistance, you can open a support case with Broadcom by selecting ESXi Storage : Storage as the product through the Broadcom Support Portal

Additional Information

Note: If disk consolidation fails with this issue, affected VMs may become unresponsive or inaccessible, causing temporary disruption to VM access and associated services until the consolidation is successfully completed.