VMs with CBT enabled on vVOLs may fail to vmotion and go into an orphaned state or crash
search cancel

VMs with CBT enabled on vVOLs may fail to vmotion and go into an orphaned state or crash

book

Article ID: 320558

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

In vSphere 7.0 and 8.0, VMs with CBT enabled on vVOLs may fail to vMotion and show the following symptoms:

VMs may crash and create a zdump file in the vm directory.
VMs may go into an orphaned state in vCenter.
 

Under the storage providers tab in vCenter you may also see:

Host I/O filters go offline
VASA providers filters go offline.
 

You may also see the below log pattern in the respective vmware.logs of the vm at the time of the issue.

Source Log:
vmware-14.log  YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Closing all the disks of the VM.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - Closing disk 'scsi0:0'
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - DISKLIB-CBT   : Shutting down change tracking for untracked fid 45944410.                                                        <===================== Closing 
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - DISKLIB-CBT   : Successfully disconnected CBT node.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - OBJLIB-VVOLOBJ : VVolObjClose: Closed VVol 'rfc4122.<uuid>', Time taken: 266239 microseconds.
YYYY-MM-DDTHH:MM:SS In(05) vcpu-0 - DISKLIB-VMFS  : "vvol:#########################/rfcXXX.<uuid>" : closed.                   <===================  Approx 8 seconds to close.
YYYY-MM-DDTHH:MM:SS In(05) worker-2436867 - Migrate: Remote Log: Destination waited for 14.78 seconds.                                                                   
YYYY-MM-DDTHH:MM:SS In(05) worker-2436867 - Migrate: Remote Log: Beginning checkpoint restore.
YYYY-MM-DDTHH:MM:SS In(05) worker-2436867 - Migrate: Remote Log: Switching to checkpoint state.


Destination Log: vmware-15.log
YYYY-MM-DDTHH:MM:SS In(05) vmx - Destroying virtual dev for scsi0:0 vscsi=10488095388481824
YYYY-MM-DDTHH:MM:SS In(05) vmx - VMMon_VSCSIStopVports: No such target on adapter
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-LIB   : DiskLib_ForceLoadFilters: Disk was opened with OPEN_NOFILTERS. Forcing a delayed load of all filters.
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-LIB_BLOCKTRACK   : Resuming from change tracking info file /vmfs/volumes/vvol:#########################/rfc<uuid>/SAMPLE-VM-ctk.vmdk.
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-CTK   : ChangeTrackerOpenOnDiskWork: Could not open tracking file /vmfs/volumes/vvol:#########################/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk (4). <======= Failure on destination Host.
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-CTK   : Could not open change tracking file "/vmfs/volumes/vvol:#########################/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk": Could not open or create change tracking file.
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-LIB_BLOCKTRACK   : Could not open change tracker /vmfs/volumes/vvol:#########################/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk: Could not open or create change tracking file.
YYYY-MM-DDTHH:MM:SS In(05) vmx - DISKLIB-LIB   : DiskLib_ForceLoadFilters: DiskLibBlockTrackResume failed : Could not open or create change tracking file (0x83c).
YYYY-MM-DDTHH:MM:SS Wa(03) vmx - DiskDelayedFilterAttachAll Disk 'scsi0:0' force load filters error: Could not open or create change tracking file.
YYYY-MM-DDTHH:MM:SS In(05) vmx - MigrateSetStateFinished: type=2 new state=12
YYYY-MM-DDTHH:MM:SS In(05) vmx - MigrateSetState: Transitioning from state 11 to 12.
YYYY-MM-DDTHH:MM:SS In(05) vmx - Migrate: Caching migration error message list:
YYYY-MM-DDTHH:MM:SS In(05) vmx - [msg.migrate.multiwriter.delayed.filter.attach.failed] Delayed filter attach failed on destination.
YYYY-MM-DDTHH:MM:SS Cr(01) vmx - PANIC: Delayed filter attach failed on destination virtual machine during migration.
YYYY-MM-DDTHH:MM:SS Wa(03) vmx - A core file is available in "/vmfs/volumes/vvol:#########################/rfc4122.<uuid>/vmx-zdump.000"
YYYY-MM-DDTHH:MM:SS In(05) vmx - Backtrace:
YYYY-MM-DDTHH:MM:SS Wa(03) mks - Panic in progress... ungrabbing 

 

Environment

VMware vSphere ESXi 7.0
VMware vSphere ESXi 8.0

Cause

The problem is encountered in some instances when CBT is enabled on a VM and it takes longer than anticipated to flush in-memory CBT data to the disk before disk closure.

Resolution

This issue has been resolved in ESXi 7.0 Update 3q and ESXi 8.0 update3 . Should you experience the same problem, please update your ESXi host to this version.

Alternatively you can you the following steps to work around the issue:

Currently the workaround to prevent VMs from being impacted is to disable CBT, Note as a result of disabling CBT you will no longer be able to carry out incremental backups for these VMs:

 

To disable CBT while the vm is powered off, use the steps below in the VMs advanced configuration.

  • Power off the virtual machine.
  • Right-click the virtual machine and click Edit Settings.
  • Click the Options tab.
  • Click General under the Advanced section and then click Configuration Parameters. The Configuration Parameters dialog opens.
  • Set the ctkEnabled parameters to false for the desired SCSI disk(s).
  • Power on the virtual machine.

For more information on CBT please see the below KB:
Changed Block Tracking (CBT) on virtual machines (1020128)


 

To disable CBT while the vm is powered on, use the steps below in the VC MOB:

  1. Log in to the vSphere Client.
  2. Navigate to the VM on which CBT should be disabled. Copy the VM id from the URL of the browser. It should be similar to 'vm-<number>', not the entire string, as outlined in red in the example below:

3. Then Navigate to the VC MOB using the URL below, while substituting in the relevant vm ID:

https://<VC-FQDN/IP>/mob/?moid=<vm-id>

4. Under the "Methods" section there is a "ReconfigVM_Task" option as seen below.

5. Select this method and you will see a new window, replace the default spec with the spec below and invoke the method, you will the see a reconfigure task for the vm in task and events in the VC UI: 

<spec>
<changeTrackingEnabled>false</changeTrackingEnabled>
</spec>


6. Then you will need to take a snapshot of the vm and consolidate the vm or vMotion the vm to another host.