Troubleshooting issues resulting from locked virtual disks

Products

VMware vSphere ESXi

Issue/Introduction

This article provides information on troubleshooting locked virtual disks. This is a recurring issue when using snapshot-based backup solutions, unsupported disk formats, or due to general storage issues.

As multiple causes for locks are possible, this article provides a step-by-step process with multiple troubleshooting points. These points are marked as Consolidation/Power On should now be possible.

Symptoms:

Starting in vSphere 8.0 U2, when viewing a failed "Power On virtual machine" task status, the UI shows:

Unable to access file <file path to vmdk> since it is locked
KB 2107795
filePath: <file path to vmdk>
host: <hostname> , <host IP>
mac: ['<mac address>']
id: <id number>
worldName: vmx
lockMode: Exclusive

Powering on a virtual machine fails
Errors similar to:

Unable to access a file filename since it is locked

Unable to access virtual machine configuration

Consolidating virtual machine snapshot fails

An error occurred while consolidating disks: msg.snapshot.error-DISKLOCKED

An error occurred while consolidating disks: msg.fileio.lock.

After successfully running a snapshot base backup, the virtual machine overview tab shows a message similar to:

Virtual Disk Consolidation is required

Environment

VMware vSphere ESXi 6.x,7.x,8.x
VMware vCenter Server

Cause

Locked virtual disks can occur as a result of, but are not limited to, the following:

A powered-on virtual machine contains locks on all files in use by the owning ESXi host to facilitate read and write access.
Other locks may be created by hot-adding disks to snapshot-based backup appliances during the backup process.
Failure to create a lock / start a virtual machine can occur if an unsupported disk format is used or if a lock is already present.

Resolution

Troubleshooting Locked Virtual Disks on VMFS Volumes

Write details to the logs.

Snapshot and power-on operations are written to /vmfs/volumes/<vm-datastore>/<vm-name>/vmware.log file. Snapshots only write to this file if the virtual machine is powered on.

If there is a consolidation issue and the virtual machine is powered on, trigger a consolidate task to write a new set of log entries to assist in troubleshooting the problem.

If the virtual machine fails to power on, skip this item and Step 3.

To identify a disk lock, clear all expected locks on the disks.

An exclusive lock through the owning ESXi host protects a powered-on virtual machine. Power down the virtual machine, if applicable.

All remaining locks are considered unexpected and need investigation.

Find the locked file.

Note: Starting in vSphere 8.0 U2, the owner of the lock may be found by viewing the Status of the failed 'Power On virtual machine' task.

Refer to VMware virtual machine file lock on VMFS datastore before moving on (for VMFS datastore).
Refer to Understanding the NFS .lck lock file to understand the ESX host and NFS filename it refers to (for NFS datastore).

If the above troubleshooting does not resolve the issue, there may be another type of lock not caused by hot-add or another running virtual machine with access to the same disk. Proceed to Step 4.

Find the VM/process/service holding the lock

SSH to the ESXi host via root from Step 3 that the returned MAC address has identified.
Find the process responsible for the lock by running:

ps | grep -i vm-name

If this command returns a non-empty output, there is a process locking the disks. If it is empty / no output, skip to Step 4L.
In the above output, in the third column, look for vmm0, note the corresponding ID in the first column, and run this command to output the World ID.

esxcli vm process list | grep -i <ID> -B1

The output will return the World ID of the virtual machine corresponding to the process ID. This virtual machine is holding the lock.
Remove the disk from this virtual machine or power down the virtual machine.

Consolidation/Power On should now be possible.

If the above step fails, find the service/task holding the lock by running the below command:

lsof | grep -i <vm-name>

If this returns an output, a service or task is locking the disks. This should only happen on the same ESXi host where the virtual machine is registered.
To find more information about the task, run the below command:

vim-cmd vmsvc/getallvms | grep -i vm-name
Note the ID from the above step's output (number at the beginning of the line) and run the below command:

vim-cmd vmsvc/get.tasklist <ID>

The output should look like:

vim-cmd vmsvc/get.tasklist 109
(ManagedObjectReference) [
'vim.Task:haTask-109-vim.vm.Snapshot.remove-########'
]

Note the task ID after ":"
To get more information about the specific task, run the below command:

vim-cmd vimsvc/task_info <Task ID>

For example:

vim-cmd vimsvc/task_info haTask-109-vim.vm.Snapshot.remove-########

This outputs useful information (shortened) to identify the task:

task = 'vim.Task:haTask-109-vim.vm.Snapshot.remove-########',
state = "running",
cancelled = false,
cancelable = true,
progress = 75,
startTime = "YYYY-MM-DDTHH:MM:SS.MS<Time_Zone>",
Restart the management agents of this ESXi host to clear the service/task. For more information, see Restarting the Management agents on an ESXi or ESX host.

Note: Deactivate HA Host Monitoring first to prevent an unwanted VM failover.

After a couple of seconds, run Step H (vim-cmd vmsvc/get.tasklist <ID>)again; it should return an empty output.

Consolidation/Power On should now be possible.

Sometimes, this process needs more clean-up work. Rerun the first command again to verify the virtual machine is still registered (alternatively, check within the vCenter Server inventory via UI if the VM's state is displaying invalid).

vim-cmd vmsvc/getallvms | grep <ID>

Example:

Skipping invalid VM '109'

Note: Skip the rest of this section if no longer receiving an invalid VM output.

This output shows that there is a conflict. The virtual machine might still be running, but the ID is unassigned. Run the following command:

esxcli vm process list | grep -i <vm-name> -B5

The output shows the virtual machine listed and additional information. Note the World ID of this virtual machine.

To kill the virtual machine's process (hard shutdown), run the below command:

esxcli vm process kill -t force -w <World ID>

Note: This kills the virtual machine process (hard shutdown). Alternatively, try to RDP to this virtual machine and shut it down from the Guest-OS level if the virtual machine is responsive.

Run the above esxcli vm process list command again (after a few seconds); the output should now be empty. Remove and re-add the virtual machine from/to the vCenter inventory.

Consolidation/Power On should now be possible.

If consolidation or power on the virtual machine still fails, open a support request with VMware.

Troubleshooting Locked Virtual Disks on NFS Volumes

Locking issues on NFS datastores differ from locking issues on VMFS datastores due to the difference in the locking mechanism. NFS does not provide block-level access, preventing SCSI locks. NFS locks are implemented by creating lock files on the NFS server. Browsing an NFS datastore to show hidden files, a number of .lck-#### files will be seen. Due to this locking mechanism, the same command line tools to determine lock holders cannot be used.

Power down the virtual machine, backup appliances, and other virtual machines that could access the virtual disks.

Find the lock

SSH as root to the ESXi host where the affected virtual machine is registered and browse the datastore.
Run this command to show the hidden .lck-#### files:

ls -lha

Note: If the VM is powered down and there is no other access to any of the virtual disks, there should be no .lck-#### file.

To get more information about the lock

If there is a .lck-#### file, run the following command to obtain further information on its origin:

hexdump -C .lck-#### (replace with correct filename)

Output will provide the hostname of the lock owner.

For example: esxi.example.com

Remove the lock

Delete this file using the rm command (only if the virtual machine is powered off).

rm .lck-#### (replace with correct filename)

Do the same ls -lha command a couple of seconds later to check if the lock was rewritten.

If it is rewritten, investigate which virtual machines this ESXi host owns to find the virtual machine causing this issue (usually a backup appliance or an ISO from NFS mounted as CD/DVD).

If it is not rewritten, Consolidation/Power On should now be possible.

Issue is not due to .lck-#### files but due to general connectivity issues

This article does not consider any issues that might arise due to general NFS connectivity issues. For general troubleshooting, refer to Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts.

Additional Information

ESXi 6.x and later uses NFSv4.1 compatible and support Kerberos 5 authentication among other benefits.

For vSphere 6.x and later, VMware recommends the following when mounting NFS datastores on different ESXi host:

Do not mix NFS protocol versions on the ESXi hosts.
Configure the Network Attached Storage (NAS) to use only one protocol version.
Do not mix IPv4 or IPv6 for all ESXi hosts connection with NFS.

Failed to power on virtual machine

Determining if a virtual disk is attached to another virtual machine
Unable to delete the virtual machine snapshots
Restarting the Management agents in ESXi
Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts
Investigating virtual machine file locks on ESXi
Snapshot removal task stops at 99% in ESXi/ESX
Error: "Device or Resource Busy" while attempting to delete a Datastore.
Types of supported Virtual Disks on ESXi/ESX hosts
Unable to delete the virtual machine snapshots
Estimate the time required to consolidate virtual machine snapshots