VM power on fails at 73% with error: Change tracking invalid or disk in use
search cancel

VM power on fails at 73% with error: Change tracking invalid or disk in use

book

Article ID: 322285

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
VM power on fails at 73% with generic error "Operation timed out".
The vmware.log will report below errors with ctk files.

vmx| I125: DISKLIB-CTK   : Could not open change tracking file "/vmfs/volumes/12345678-111111111-abcd-efg1234567/ExampleVM_2/ExampleVM_2-000004-ctk.vmdk": Change tracking invalid or disk in use.
vmx| I125: DISKLIB-CTK   : Change tracking invalid; reinitializing.
vmx| I125: DISKLIB-CTK   : Auto blocksize for size 6442450944 is 4096.
'/vmfs/volumes/12345678-111111111-abcd-efg1234567/ExampleVM_2/ExampleVM_2_1.vmdk' (0xe): vmfs, 6442450944 sectors / 3 TB.
vmx| I125: OBJLIB-FILEBE : Error creating file '/vmfs/volumes/12345678-111111111-abcd-efg1234567/ExampleVM_2/ExampleVM_2-000004-ctk.vmdk': 3 (The file already exists).
vmx| I125: DISKLIB-CBT   : Initializing ESX kernel change tracking for fid 15341702.
vmx| I125: DISKLIB-CBT   : Successfuly created cbt node ea1886-cbt.
vmx| I125: DISKLIB-CBT   : Opening cbt node /vmfs/devices/cbt/ea1886-cbt
worker-2653492| I125: DISKLIB-LIB_BLOCKTRACK   : Resuming change tracking.
worker-2653491| I125: DISKLIB-LIB_BLOCKTRACK   : Resuming change tracking.
worker-2653492| I125: DISKLIB-CTK   : Could not open change tracking file "/vmfs/volumes/12345678-111111111-abcd-efg1234567/ExampleVM_2/ExampleVM_2-000004-ctk.vmdk": Change tracking invalid or disk in use.
worker-2653491| I125: DISKLIB-CTK   : Could not open change tracking file "/vmfs/volumes/12345678-111111111-abcd-efg1234567/ExampleVM_2/ExampleVM_2-000004-ctk.vmdk": Change tracking invalid or disk in use

Unable to power on VM. It can take extremely long time and not even get to error, over 24 hours.

Resolution

From the above we can see that the VM is stuck due to ctk errors. To resolve the issue we need to temporarily delete/move ctk files.

1. Before doing that we need to confirm that we don't have any locks using the below command:

# vmfsfilelockinfo -p ExampleVM_2-flat.vmdk

vmfsfilelockinfo Version 2.0
Looking for lock owners on "ExampleVM_2-flat.vmdk"
"ExampleVM_2-flat.vmdk" is locked in Read-Only mode by host having mac address ['XX:XX:XX:XX:XX:XX']
Trying to use information from VMFS Heartbeat
Host owning the lock on file is XXX.XXX.XXX.XXX, lockMode : Read-Only
Total time taken : 14.671592491678894 seconds.

The VM is locked with the same host.


2. Check the VM process using the below command:

# esxcli vm process list |less

  ExampleVM_2:
   World ID: 1234567
   Process ID: 0
   VMX Cartel ID: 7654321
 UUID: XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX
   Display Name: ExampleVM_2
   Config File: /vmfs/volumes/1aaaa1a1-12345678-a11a-aaa111a1abba/ExampleVM_2/ExampleVM_2.vmx


   

3. There is a process, so we can kill it using the below command:



# esxcli vm process kill -t hard -w "world ID from the previous command"   
# esxcli vm process kill -t hard -w 7654321



4. Create a temp folder and move all the ctk files to that folder. 
5. Backup the vmx file to the temp folder, then disable the ctk  option in the vmx

(For each virtual disk, the .vmx file contains the entry:scsix:x.ctkEnabled = "TRUE" which needs to changed to "FALSE").


6. Open all of the vmdks and disable ctk as well by putting # in the beginning of the line (changeTrackPath=...).
7. Remove the VM from inventory and re-add it back to reload the vmx file.
8. Power on the VM. It should now power on without issues.

Note: The power on can take a very long time - Few hours in some cases.  The power on task will remain at a percentage and will be in progress

Additional Information

Reference Articles:
https://knowledge.broadcom.com/external/article?legacyId=1020128
https://knowledge.broadcom.com/external/article?legacyId=2009244