VMs with CBT enabled on vVOLs may fail to vmotion and go into an orphaned state or crash
search cancel

VMs with CBT enabled on vVOLs may fail to vmotion and go into an orphaned state or crash

book

Article ID: 320558

calendar_today

Updated On: 10-25-2024

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

In vSphere 7.0 and 8.0, VMs with CBT enabled on vVOLs may fail to vMotion and show the following symptoms:

VMs may crash and create a zdump file in the vm directory.
VMs may go into an orphaned state in vCenter.
 

Under the storage providers tab in vCenter you may also see:

Host I/O filters go offline
VASA providers filters go offline.
 

You may also see the below log pattern in the respective vmware.logs of the vm at the time of the issue.

Source Log:
vmware-14.log

2023-xx-xxT17:41:49.391Z In(05) vcpu-0 - Closing all the disks of the VM.
2023-xx-xxT17:41:49.391Z In(05) vcpu-0 - Closing disk 'scsi0:0'
2023-xx-xxT17:41:49.394Z In(05) vcpu-0 - DISKLIB-CBT   : Shutting down change tracking for untracked fid 45944410.                                                        <===================== Closing 
2023-xx-xxT17:41:49.394Z In(05) vcpu-0 - DISKLIB-CBT   : Successfully disconnected CBT node.
2023-xx-xxT17:41:57.975Z In(05) vcpu-0 - OBJLIB-VVOLOBJ : VVolObjClose: Closed VVol 'rfc4122.<uuid>', Time taken: 266239 microseconds.
2023-xx-xxT17:41:57.975Z In(05) vcpu-0 - DISKLIB-VMFS  : "vvol:0000005600009709-9450605c5098a233/rfc4122.<uuid>" : closed.                   <===================  Approx 8 seconds to close.  
2023-xx-xxT17:41:57.975Z In(05) worker-2436867 - Migrate: Remote Log: Destination waited for 14.78 seconds.                                                                   
2023-xx-xxT17:41:57.975Z In(05) worker-2436867 - Migrate: Remote Log: Beginning checkpoint restore.
2023-xx-xxT17:41:57.975Z In(05) worker-2436867 - Migrate: Remote Log: Switching to checkpoint state.


Destination Log:
vmware-15.log
2023-xx-xxT17:41:50.792Z In(05) vmx - Destroying virtual dev for scsi0:0 vscsi=10488095388481824
2023-xx-xxT17:41:50.792Z In(05) vmx - VMMon_VSCSIStopVports: No such target on adapter
2023-xx-xxT17:41:50.792Z In(05) vmx - DISKLIB-LIB   : DiskLib_ForceLoadFilters: Disk was opened with OPEN_NOFILTERS. Forcing a delayed load of all filters.
2023-xx-xxT17:41:50.792Z In(05) vmx - DISKLIB-LIB_BLOCKTRACK   : Resuming from change tracking info file /vmfs/volumes/vvol:0000005600009709-9450605c5098a233/rfc<uuid>/SAMPLE-VM-ctk.vmdk.
2023-xx-xxT17:41:54.802Z In(05) vmx - DISKLIB-CTK   : ChangeTrackerOpenOnDiskWork: Could not open tracking file /vmfs/volumes/vvol:0000005600009709-9450605c5098a233/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk (4). <======= Failure on destination Host.
2023-xx-xxT17:41:54.802Z In(05) vmx - DISKLIB-CTK   : Could not open change tracking file "/vmfs/volumes/vvol:0000005600009709-9450605c5098a233/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk": Could not open or create change tracking file.
2023-xx-xxT17:41:54.802Z In(05) vmx - DISKLIB-LIB_BLOCKTRACK   : Could not open change tracker /vmfs/volumes/vvol:0000005600009709-9450605c5098a233/rfc4122.<uuid>/SAMPLE-VM-ctk.vmdk: Could not open or create change tracking file.
2023-xx-xxT17:41:54.802Z In(05) vmx - DISKLIB-LIB   : DiskLib_ForceLoadFilters: DiskLibBlockTrackResume failed : Could not open or create change tracking file (0x83c).
2023-xx-xxT17:41:54.802Z Wa(03) vmx - DiskDelayedFilterAttachAll Disk 'scsi0:0' force load filters error: Could not open or create change tracking file.
2023-xx-xxT17:41:54.802Z In(05) vmx - MigrateSetStateFinished: type=2 new state=12
2023-xx-xxT17:41:54.802Z In(05) vmx - MigrateSetState: Transitioning from state 11 to 12.
2023-xx-xxT17:41:54.802Z In(05) vmx - Migrate: Caching migration error message list:
2023-xx-xxT17:41:54.802Z In(05) vmx - [msg.migrate.multiwriter.delayed.filter.attach.failed] Delayed filter attach failed on destination.
2023-xx-xxT17:41:54.802Z Cr(01) vmx - PANIC: Delayed filter attach failed on destination virtual machine during migration.
2023-xx-xxT17:41:55.977Z Wa(03) vmx - A core file is available in "/vmfs/volumes/vvol:0000005600009709-9450605c5098a233/rfc4122.<uuid>/vmx-zdump.000"
2023-xx-xxT17:41:55.977Z In(05) vmx - Backtrace:
2023-xx-xxT17:41:55.977Z Wa(03) mks - Panic in progress... ungrabbing

 

Environment

VMware vSphere ESXi 7.0
VMware vSphere ESXi 8.0

Cause

The problem is encountered in some instances when CBT is enabled on a VM and it takes longer than anticipated to flush in-memory CBT data to the disk before disk closure.

Resolution

This issue has been resolved in ESXi 7.0 Update 3q and ESXi 8.0 update3 . Should you experience the same problem, please update your ESXi host to this version.

Alternatively you can you the following steps to work around the issue:

Currently the workaround to prevent VMs from being impacted is to disable CBT, Note as a result of disabling CBT you will no longer be able to carry out incremental backups for these VMs:

 

To disable CBT while the vm is powered off, use the steps below in the VMs advanced configuration.

  • Power off the virtual machine.
  • Right-click the virtual machine and click Edit Settings.
  • Click the Options tab.
  • Click General under the Advanced section and then click Configuration Parameters. The Configuration Parameters dialog opens.
  • Set the ctkEnabled parameters to false for the desired SCSI disk(s).
  • Power on the virtual machine.

For more information on CBT please see the below KB:
Changed Block Tracking (CBT) on virtual machines (1020128)


 

To disable CBT while the vm is powered on, use the steps below in the VC MOB:

  1. Log in to the vSphere Client.
  2. Navigate to the VM on which CBT should be disabled. Copy the VM id from the URL of the browser. It should be similar to 'vm-<number>', not the entire string, as outlined in red in the example below:

3. Then Navigate to the VC MOB using the URL below, while substituting in the relevant vm ID:

https://<VC-FQDN/IP>/mob/?moid=<vm-id>

4. Under the "Methods" section there is a "ReconfigVM_Task" option as seen below.

5. Select this method and you will see a new window, replace the default spec with the spec below and invoke the method, you will the see a reconfigure task for the vm in task and events in the VC UI: 

<spec>
<changeTrackingEnabled>false</changeTrackingEnabled>
</spec>


6. Then you will need to take a snapshot of the vm and consolidate the vm or vMotion the vm to another host.