Contents
Troubleshooting Locked Virtual Disks on VMFS Volumes
- Write details to the logs.
Snapshot and power-on operations are written to /vmfs/volumes/<vm-datastore>/<vm-name>/vmware.log file. Snapshots only write to this file if the virtual machine is powered on.
If there is a consolidation issue and the virtual machine is powered on, trigger a consolidate task to write a new set of log entries to assist in troubleshooting the problem.
If the virtual machine fails to power on, skip this item and Step 3.
- To identify a disk lock, clear all expected locks on the disks.
An exclusive lock through the owning ESXi host protects a powered-on virtual machine. Power down the virtual machine, if applicable.
All remaining locks are considered unexpected and need investigation.
- Find the locked file.
Note: Starting in vSphere 8.0 U2, the owner of the lock may be found by viewing the Status of the failed 'Power On virtual machine' task.
If the above troubleshooting does not resolve the issue, there may be another type of lock not caused by hot-add or another running virtual machine with access to the same disk. Proceed to Step 4.
- Find the VM/process/service holding the lock
- SSH to the ESXi host via root from Step 3 that the returned MAC address has identified.
- Find the process responsible for the lock by running:
ps | grep -i vm-name
- If this command returns a non-empty output, there is a process locking the disks. If it is empty / no output, skip to Step 4L.
- In the above output, in the third column, look for vmm0, note the corresponding ID in the first column, and run this command to output the World ID.
esxcli vm process list | grep -i <ID> -B1
The output will return the World ID of the virtual machine corresponding to the process ID. This virtual machine is holding the lock.
- Remove the disk from this virtual machine or power down the virtual machine.
Consolidation/Power On should now be possible.
- If the above step fails, find the service/task holding the lock by running the below command:
lsof | grep -i <vm-name>
If this returns an output, a service or task is locking the disks. This should only happen on the same ESXi host where the virtual machine is registered.
- To find more information about the task, run the below command:
vim-cmd vmsvc/getallvms | grep -i vm-name
- Note the ID from the above step's output (number at the beginning of the line) and run the below command:
vim-cmd vmsvc/get.tasklist <ID>
The output should look like:
vim-cmd vmsvc/get.tasklist 109
(ManagedObjectReference) [
'vim.Task:haTask-109-vim.vm.Snapshot.remove-92744898'
]
Note the task ID after ":"
- To get more information about the specific task, run the below command:
vim-cmd vimsvc/task_info <Task ID>
For example:
vim-cmd vimsvc/task_info haTask-109-vim.vm.Snapshot.remove-92744898
This outputs useful information (shortened) to identify the task:
task = 'vim.Task:haTask-109-vim.vm.Snapshot.remove-92744898',
state = "running",
cancelled = false,
cancelable = true,
progress = 75,
startTime = "2014-12-06T03:47:26.30322Z",
- Restart the management agents of this ESXi host to clear the service/task. For more information, see Restarting the Management agents on an ESXi or ESX host.
Note: Deactivate HA Host Monitoring first to prevent an unwanted VM failover.
After a couple of seconds, run Step H (vim-cmd vmsvc/get.tasklist <ID>
)again; it should return an empty output.
Consolidation/Power On should now be possible.
- Sometimes, this process needs more clean-up work. Rerun the first command again to verify the virtual machine is still registered (alternatively, check within the vCenter Server inventory via UI if the VM's state is displaying
invalid
).
vim-cmd vmsvc/getallvms | grep <ID>
Example:
Skipping invalid VM '109'
Note: Skip the rest of this section if no longer receiving an invalid VM
output.
- This output shows that there is a conflict. The virtual machine might still be running, but the ID is unassigned. Run the following command:
esxcli vm process list | grep -i <vm-name> -B5
The output shows the virtual machine listed and additional information. Note the World ID of this virtual machine.
- To kill the virtual machine's process (hard shutdown), run the below command:
esxcli vm process kill -t force -w <World ID>
Note: This kills the virtual machine process (hard shutdown). Alternatively, try to RDP to this virtual machine and shut it down from the Guest-OS level if the virtual machine is responsive.
- Run the above
esxcli vm process list
command again (after a few seconds); the output should now be empty. Remove and re-add the virtual machine from/to the vCenter inventory.
Consolidation/Power On should now be possible.
- If consolidation or power on the virtual machine still fails, open a support request with VMware.
Troubleshooting Locked Virtual Disks on NFS Volumes
Locking issues on NFS datastores differ from locking issues on VMFS datastores due to the difference in the locking mechanism. NFS does not provide block-level access, preventing SCSI locks. NFS locks are implemented by creating lock files on the NFS server. Browsing an NFS datastore to show hidden files, a number of .lck-####
files will be seen. Due to this locking mechanism, the same command line tools to determine lock holders cannot be used.
Power down the virtual machine, backup appliances, and other virtual machines that could access the virtual disks.
- Find the lock
- SSH as
root
to the ESXi host where the affected virtual machine is registered and browse the datastore.
- Run this command to show the hidden
.lck-####
files:
ls -lha
Note: If the VM is powered down and there is no other access to any of the virtual disks, there should be no .lck-####
file.
- To get more information about the lock
If there is a .lck-####
file, run the following command to obtain further information on its origin:
hexdump -C .lck-####
(replace with correct filename)
Output will provide the hostname of the lock owner.
For example: esxi.example.com
- Remove the lock
Delete this file using the rm
command (only if the virtual machine is powered off).
rm .lck-####
(replace with correct filename)
Do the same ls -lha
command a couple of seconds later to check if the lock was rewritten.
If it is rewritten, investigate which virtual machines this ESXi host owns to find the virtual machine causing this issue (usually a backup appliance or an ISO from NFS mounted as CD/DVD).
If it is not rewritten, Consolidation/Power On should now be possible.
- Issue is not due to
.lck-####
files but due to general connectivity issues