Identify a known ESXi and vSAN infrastructure issue causing RAV migrationsto fail.
Symptoms: HCX RAV migrations to/from vSAN DataStore may fail after ESXi upgrade to 7.0 U3c with the following error:
Below exceptions can be seen in the logs during failure event:
HCX app.log
vMotion failed. System Error. Source side error is : Source side relocate failed for the virtual machine. A fatal internal error occurred. See the virtual machine's log for more details. msg.svmotion.fail.internal:A fatal internal error occurred. See the virtual machine's log for more details. msg.svmotion.disk.loadphase.fail:Failed to load one or more destination disks. faultTime:2022-06-13T10:24:57.632051Z Target side error is : A general system error occurred: vMotion failed: unknown error msg.migrate.waitdata.platform:Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout.
HCX Target MA log
2022-06-13T10:24:57.577Z In(05) worker-15918360 - OBJLIB-FILEBE : FileBEIoctl: ioctl operation IOCTLCMD_VMFS_DELTADISKS(3033) failed on '/vmfs/devices/vsansparse/cc9de53-7910a762-da61-cdfc-1b56-1c34da6056d4' : No such file or directory (131074)
2022-06-13T10:24:57.577Z Wa(03) worker-15918360 - SVMotionGetDeltaDiskUUID: Failed to query uuid from COW hierarchy: No such file or directory.
2022-06-13T10:24:57.577Z Wa(03) worker-15918360 - Mirror: scsi0:0: SVMotionLocalDiskLoad: Failed to get datastore uuid for destination disk /vmfs/volumes/vsan:522b0a19220d748d-d20b06d764135dd8/2c10a762-8865-e50a-212a-1c34da6056d4/tiny-VM-000000.vmdk.
2022-06-13T10:24:57.577Z Wa(03) worker-15918360 - Mirror: scsi0:0: Failed to load disk: /vmfs/volumes/vsan:522b0a19220d748d-d20b06d764135dd8/2c10a762-8865-e50a-212a-1c34da6056d4/tiny-VM-000000.vmdk.
Cause
During a RAV migration switchover, when delta vMotion is initiated, a query is made to fetch the DataStore UUID for the relocate task. In some cases, when running the affected ESXi version, vSAN does not return the expected UUID causing the migration workflow to fail.
vSphere Management may have null UUIDs for vSAN clusters which will trigger this problem.
Resolution
vSAN Disk Format Conversion (DFC) must be upgraded after vSphere ESXi 7.0 U3c upgrade to remediate this issue.
This issue is resolved in ESXi 7.0 P05 scheduled to be released on July 7th, 2022. Release date may be subject to change without notice. Note: ESXi 7.0 P05 version will not require DFC upgrade to resolve this issue.
Workaround: If possible, a different DataStore type (VMFS5, VMFS6, or NFS) may be used as source or destination for the RAV migration. Alternatively, vMotion or Bulk migration can be used to relocate VMs between DataCenters.
Additional Information
Impact/Risks:
This issue ONLY affects HCX RAV migration workflows.
RAV migrations may be affected when target OR source DataStore is vSAN cluster with ESXi 7.0 U319193900 build.
RAV migration to/from other DataStore types are NOT affected (VMFS5, VMFS6, or NFS).
There is NO impact to other migration services like vMotion, Cold or Bulk.
This issue is unrelated HCX Network Extension services.