Snapshot consolidation fails due to locks held by 3rd party backup software
search cancel

Snapshot consolidation fails due to locks held by 3rd party backup software

book

Article ID: 321365

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

Virtual Machine file locks have many origins. This article covers unwanted file locks caused by backup applications with a proxy VM.

Potential Symptoms:

  • Snapshots cannot be committed.
  • Snapshots commit without errors and Snapshot Manager is no longer populated. However, a snapshot disk remains in the VM directory, and the VM still runs on that snapshot.
  • Consolidation failed with the errors similar to:


Failed to lock the file

or

One or more disks are busy

or

Unable to consolidate virtual machine snapshots due to file lock

or

Unable to access file since it is locked

  • The VM summary tab displays messages similar to:


Snapshot consolidation required

or

Virtual machine disks consolidation is needed

or

Virtual machine Consolidation Needed status


Note: For additional symptoms and log entries, see the Related Information section.

 

Environment

  • ESXi 6.x
  • ESXi 7.x
  • ESXi 8.x
  • ESX 9.x

Cause

This issue occurs if:

  • A 3rd party vendor's backup solution backup proxy VM, or other VM, holds a lock on the base disk, or on a previous snapshot file of the VM that has the consolidation issue. This prevents snapshot consolidation from succeeding.
  • A 3rd party vendor's backup solution does not complete a backup gracefully, or fails to clean up properly.
  • A 3rd party vendor's backup solution does not follow VADP documented workflow. A typical example: the snapshot disk is not closed properly before snapshot delete and consolidation.

Resolution

1. Investigate locks to locate the 3rd party backup solution's proxy VM with the disk attached: Investigating virtual machine file locks on ESXi hosts
 
2. To resolve this issue caused by backup with VDDK HotAdd transport mode, remove the VM disk(s) from the backup proxy VM.

Caution: Make sure there is no backup job running on the VM that has the consolidation issue.

  1. Right-click the backup proxy VM.
  2. Click Edit Settings.
  3. Expand all the Hard Disk(s).
  4. Select the Hard Disk(s) belonging to the VM that has the problem.
  5. Click on the X beside the Hard Disk to unmount the Hard Disk from the VM.       Caution: Do NOT select Delete files from the datastore.
  6. Click OK.
  7. Consolidate/Delete the snapshot on the VM.


3. To resolve this issue caused by backup with VDDK NBD/NBDSSL/SAN transport mode, kill the specific backup process in the backup proxy.

Caution: Make sure there is no backup job running on the VM that has the consolidation issue.

  1. You might need 3rd party backup vendor's help to identify the backup process.
  2. Or, make sure no backup job is running on the backup proxy then reboot the backup proxy


4. To resolve this issue caused by the wrong backup workflow of 3rd party backup solutions, backup vendors should implement proper failure cleanup.

  1. The backup software should ensure every disk is closed, to free the lock on ESX hosts after the disk open.
  2. If the backup process is somehow killed, proper cleanup should be implemented to recover by calling VDDK functions VixDiskLib_Cleanup, VixDiskLib_EndAccess, etc. 

Additional Information

You might experience these additional symptoms:

 

  • In the vmware.log, you see errors similar to:

    vmx| ConsolidateOnlineCB: nextState = 2 uid 3
    vmx| Foundry operation failed with system error: Device or resource busy (16), translated to 5
    vmx| ConsolidateOnlineCB: Done with consolidate

 

  • When you attempt to remove the datastore, you see this error:
    The resource '<VMFS-UUID>' is in use.

 

  • If you are running third party back up software, consolidation might fail with the following errors in vmware.log file:

    vcpu-0| Vix: [8803 mainDispatch.c:4084]: VMAutomation_ReportPowerOpFinished: statevar=3, newAppState=1881, success=1 additionalError=0
    vcpu-0| Vix: [8803 vigorCommands.c:577]: VigorSnapshotManagerConsolidateCallback: snapshotErr = Failed to lock the file (5:4008)
    vcpu-0| SnapshotVMXConsolidateOnlineCB: Destroying thread 6
    vcpu-0| Turning off snapshot info cache.
    vcpu-0| Turning off snapshot disk cache.
    vcpu-0| SnapshotVMXConsolidateOnlineCB: Done with consolidate

 

  • In the vmkernel or messages log files, you see entries similar to:

    vmkernel: gen 2141, mode 1, owner 4b94bb81-XXXXXXXX-3bd1-XXXXXXXXXXX mtime 244622]on volume 'LUN03'.
    vmkernel: [YYYY-MM-DDTHH:MM:SS] cpu2:4109)FS3: 2890: [Requested mode: 1] Lock [type 10c00001 offset 7505920 v 920, hb offset 3510272
    vmkernel: gen 2141, mode 1, owner 4b94bb81-XXXXXXXX-3bd1-XXXXXXXXXXX mtime 244622] is not free on volume 'LUN03'
    vmkernel: [YYYY-MM-DDTHH:MM:SS] cpu2:4111)FS3: 2798: [Requested mode: 1] Checking liveness of lock holders [type 10c00001 offset 7313408 v 796, hb offset 3510272

 

  • In the hostd.log file, you see entries similar to these during the snapshot delete process:

    DISKLIB-LIB : Failed to delete disk '/vmfs/volumes/4c5f4b7a-XXXXXXXX-32ad-XXXXXXXXX/TESTVM/TESTVM_1-000001.vmdk' or one of its components: Device or resource busy

 

  • When you attempt to consolidate by right-clicking the virtual machine and clicking Snapshot > Consolidate, you see errors similar to:

    Consolidate virtual machine disk files <hostname> Unable to access file <unspecified filename> since it is locked
    Consolidation failed for disk node 'scsi0:8': msg.fileio.lock.