vMotion of virtual machines fail after reaching 68%
search cancel

vMotion of virtual machines fail after reaching 68%

book

Article ID: 412117

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Manual and DRS initiated vMotions fail
  • Migration may fail with below error:

The migration has exceeded the maximum switchover time of 100 second(s). ESX has preemptively failed the migration to allow the VM to continue running on the source.  To avoid this failure, either increase the maximum allowable switchover time or wait until the VM is performing a less intensive workload.

  • All affected virtual machines are VDI VM's provisioned by Horizon view
  • In the vmware.log of the affected virtual machine, you may see below entries where a CID mismatch is detected and re-creation of digest disk takes more than 100 seconds

In(05) worker-241568940 - DIGESTLIB-FILE : DigestLibFileOpenInt: CID mismatch -> disk=0x61cfbc00, header=0x61d37aaa.
In(05) worker-241568940 - DISKLIB-LIB_CREATE   : Create type of digest disk 'path of vmdk' is chosen as 26.
In(05) vmx - DISK: Opening disks took 110000 ms.

  • Content-Based Read Cache(CBRC) which is a feature used by Horizon view for VDI VM's is being enabled and disabled very frequently. You may find below entries in the /var/run/log/vmkernel.log of the ESXi host

cpu76:2100334 opID=#####)CBRC: 1580: CBRC module is disabled enable(0) cbrcdata(0) status(Success)
cpu67:2099849 opID=#####)CBRC: 1574: CBRC module is enabled(1) cbrcdata(1) status(Success)
cpu26:2100313 opID=#####)CBRC: 1580: CBRC module is disabled enable(0) cbrcdata(0) status(Success)
cpu62:2101992 opID=#####)CBRC: 1574: CBRC module is enabled(1) cbrcdata(1) status(Success)

Environment

  • VMware vSphere ESXi 7.x
  • VMware vSphere ESXi 8.x

Cause

  • vMotion has a threshold of 100 seconds for disk related activities.
  • Digest disk recreation due to a CID mismatch between the data and digest disk takes longer than 100 seconds and cause vMotion to fail

Resolution

  • Engage Horizon support for further investigation on why CBRC is toggled very frequently
  • As a workaround retrying the migration after a failed migration should work provided there is no CBRC toggle operations during the migration