Symptoms:
- vMotion fails at 20-21%
- Issue occurs on ESXi 6.7 hosts and VMFS-6 datastores
- vmware.log for affected VM indicates following errors:
2019-10-17T13:35:58.202Z| vmx| I125: Received migrate 'to' request for mid id 2384187699948208900, src ip <xx.xxx.xxx.xx>, dst ip <xx.xxx.xxx.xx>(invalidate source config).
2019-10-17T13:35:58.203Z| vmx| A100: ConfigDB: Setting vmotion.checkpointSVGASize = "9961472"
2019-10-17T13:36:06.429Z| vmx| W115: FILE: FileIO_Lock on '/vmfs/volumes/XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX/VMNAME/VMNAME.vmx' failed: Lock timed out
2019-10-17T13:36:06.430Z| vmx| I125: Msg_Reset:
2019-10-17T13:36:06.430Z| vmx| I125: [msg.configdb.open] An error occurred while opening configuration file "/vmfs/volumes/XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX/VMNAME/VMNAME.vmx": Failed to lock the file.
2019-10-17T13:37:36.622Z| vmx| W115: FILE: FileIO_Lock on '/vmfs/volumes/XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX/VMNAME/VMNAME.vmx' failed: Lock timed out
2019-10-17T13:37:36.623Z| vmx| I125: Msg_Reset:
2019-10-17T13:37:36.623Z| vmx| I125: [msg.configdb.open] An error occurred while opening configuration file "/vmfs/volumes/XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX/VMNAME/VMNAME.vmx": Failed to lock the file.
2019-10-17T13:37:36.623Z| vmx| I125: ----------------------------------------
2019-10-17T13:37:36.623Z| vmx| W115: Migrate: Failed to write out config file.
2019-10-17T13:37:36.623Z| vmx| I125: Migrate: Caching migration error message list:
2019-10-17T13:37:36.623Z| vmx| I125: [msg.migrate.expired] Timed out waiting for migration start request.
- VMkernel log on source host reports following errors:
2019-10-17T13:36:26.478Z cpu0:2100106)DLX: 4949: vol 'DATASTORE_NAME', lock at 154738688: [Req mode: 1] Not free:
2019-10-17T13:36:26.478Z cpu0:2100106)[type 10c00001 offset 154738688 v 4701, hb offset 3641344
gen 4865, mode 1, owner XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX mtime 571270
num 0 gblnum 0 gblgen 0 gblbrk 0] alloc owner 4063232
2019-10-17T13:36:30.484Z cpu4:2100106)DLX: 4949: vol 'DATASTORE_NAME', lock at 154738688: [Req mode: 1] Not free:
2019-10-17T13:36:30.484Z cpu4:2100106)[type 10c00001 offset 154738688 v 4701, hb offset 3641344
gen 4865, mode 1, owner XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX mtime 571270
num 0 gblnum 0 gblgen 0 gblbrk 0] alloc owner 4063232
- VMkernel log on destination host reports following errors:
2019-10-17T13:35:45.767Z cpu19:2175769)HBX: 6416: 'DATASTORE_NAME': HB at offset 3551232 - Marking HB:
2019-10-17T13:35:45.767Z cpu19:2175769) [HB state abcdef04 offset 3551232 gen 7 stampUS 131655160 uuid XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX jrnl drv 24.82 lockImpl 4 ip xx.xxx.xxx.xx]
2019-10-17T13:36:01.768Z cpu18:2175769)HBX: 6433: 'DATASTORE_NAME': HB at offset 3551232 - Skipping replay as HB is being replayed by another live host:
2019-10-17T13:36:01.768Z cpu18:2175769) [HB state abcdef04 offset 3551232 gen 7 stampUS 131655160 uuid XXXXXXXX-XXXXXXXX-XXXX-XXXXXXXXXXXX jrnl drv 24.82 lockImpl 4 ip xx.xxx.xxx.xx]
2019-10-17T13:36:01.768Z cpu18:2175769)Res3: 2328: Rank violation threshold reached: cid 0xc1d00002, resType 1, cnum 5 vol DATASTORE_NAME