Investigating virtual disk file locks on vSAN
search cancel

Investigating virtual disk file locks on vSAN

book

Article ID: 326800

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

On a VMFS datastore, we often check for file locks on the -flat or -delta file virtual disk file.  However, these files don't exist on vSAN since it is an object base system.  This article details how to check for locks on those virtual disk objects.

Symptoms:
File lock issues can cause various problems, including but not limited to the inability to power on VMs or consolidate snapshots.  You will need to identify which hosts may be holding locks on a virtual disk file residing on vSAN.

Environment

VMware vSAN 7.0.x
VMware vSAN 6.7.x

Cause

vSAN has a specific object type, vdisk, for virtual disks.  They are not stored with the configuration files for the VM in the namespace directory.

Resolution

First, check for backup proxy servers in use. If there are then check to see if the affected disk is still mounted to the proxy server. If you find the disk attached to the proxy server then remove the disk from the proxy server ensuring "Delete from disk" is NOT selected.

Note: There may be more than one proxy server in use. Make sure to check all proxy servers.

vSAN uses .lck files

The name of the .lck file will have the UUID of the VSAN object it represents as the file name.

To check the Descriptor, change the directory into the VM namespace.

For example: cd /vmfs/volumes/vsanDatastore/<VM_Namespace>

Then run  grep RW VMDiskName.vmdk

You'll see output similar to this,

# Extent description
RW 209715200 VMFS "vsan://e7c66759-680f-e86b-798d-a0369fa131f0"


The UUID “e7c66759-680f-e86b-798d-a0369fa131f0” is the vSAN object representing the vdisk for that descriptor.

Note: If you get an error with device or resource busy then SSH to the host the VM is registered to and work from that host.

The following command will show all  .<uuid>.lck files within the vSAN namespace directory :

# ls -lah .*.lck

You'll see something similar to this,

-rw------- 1 root root 0 Jul 13 2017 .e7c66759-680f-e86b-798d-a0369fa131f0.lck

There may also be non-hidden lock files which you can diagnose similarly by running the following :

# ls -lah *.lck

Run vmfsfilelockinfo -p .e7c66759-680f-e86b-798d-a0369fa131f0.lck which will show the lock details for this vSAN object

vmfsfilelockinfo Version 2.0
Looking for lock owners on ".e7c66759-680f-e86b-798d-a0369fa131f0.lck"
"<VMname>.vswp.lck" is locked in Exclusive mode by host having mac address ['xx:xx:xx:xx:xx:xx']
Trying to make use of Fault Domain Manager
----------------------------------------------------------------------
Found 6 ESX hosts using Fault Domain Manager.
----------------------------------------------------------------------
Searching on Host esxi1
Searching on Host esxi3
Searching on Host esxi4
Searching on Host esxi2
Searching on Host esxi6
Searching on Host esxi5
    MAC Address : xx:xx:xx:xx:xx:xx


Host owning the lock on file is esxi5, lockMode : Exclusive
Total time taken : 0.11339905299246311 seconds.


If no lock is found it will look like this:
vmfsfilelockinfo Version 2.0
Looking for lock owners on ".e7c66759-680f-e86b-798d-a0369fa131f0.lck"
".e7c66759-680f-e86b-798d-a0369fa131f0.lck" is not locked by any ESX host and is Free
Total time taken : 0.037906300276517868 seconds.


Alternatively, you can also run the command vmkfstools -D against this file, which will show the lock details for this vSAN object as well.

Example:

# vmkfstools -D .e7c66759-680f-e86b-798d-a0369fa131f0.lck

You should see output similar to this,

Lock [type 10c00001 offset 152799232 v 830, hb offset 3969024
gen 215, mode 1, owner 5c576ea9-e19f62dc-07eb-a0369fa12052 mtime 1107249
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr <4, 354, 1>, gen 3, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 0, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 4305, bs 8192


The part in bold is the MAC address of the management VMkernel port. It should correspond to a host in the vSAN cluster.

Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is:
  • mode 0 = no lock
  • mode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)
  • mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)
  • mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs)

 

Once you have the name of the host owning the lock SSH into that host and try restarting the management services hostd & vpxa with the following command /etc/init.d/hostd restart && /etc/init.d/vpxa restart

If the lock is still present then run lsof |grep <vmname> && ps|grep <vmname> For example:
[root@esxi4:~] lsof |grep cent7_2 && ps|grep cent7_2
7565528     vmx                   FILE                       43   /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx.lck
7565528     vmx                   FILE                       44   /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx
7565528     vmx                   FILE                       45   /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx~
7565528     vmx                   FILE                       82   /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.nvram
7565529  0        vmm0:cent7_2
7565533  0        vmm1:cent7_2
7565535  7565528  vmx-filtPoll:cent7_2
7565536  7565528  vmx-mks:cent7_2
7565537  7565528  vmx-svga:cent7_2
7565538  7565528  vmx-vcpu-0:cent7_2
7565540  7565528  vmx-vcpu-1:cent7_2


The number in bold is the world process ID we can kill this process by running kill <PID>. Make sure you run this command only from the host or hosts the VM is NOT registered to.
Note: If the VM is powered down there should be no open files (lsof) or active processes (ps) for the VM. Additionally, you should only see open files or active processes on the host the VM is registered to when the VM is powered on.

If you find no locks with either of the lock commands you can try running lsof |grep <vmname> && ps|grep <vmname> on all hosts in the cluster to see if you find a process on more than one host. If there are running processes then kill the process on any of the hosts that might have a hung process related to the VM.
Note: Make sure you're only killing the process on hosts the VM is NOT registered to especially if the VM is powered on.

If either vmfsfilelockinfo -p or vmkfstools -D commands finds no locks and lsof |grep <vmname> && ps|grep <vmname> finds no active process for the VM on any host and still getting file lock errors then we are dealing with a phantom lock and a rolling reboot of the cluster is required to clear the lock.

Workaround:
In order to check all the VM files and/or vSAN object lock files get the name of the files and/or vSAN object lock files that are locked, also which host is locking the files, run the following commands in the VM directory

for file in *; do echo ${file}; vmfsfilelockinfo -p ${file} |grep -i mode; done

Output Example:

Test-3f9d789c.hlog
Test-ec315dde.vswp
Test-ec315dde.vswp.lck
"Test-ec315dde.vswp.lck" is locked in Exclusive mode by host having mac address ['00:XX:56:XX:11:XX']
Host owning the lock on file is <Hostname>, lockMode : Exclusive
Test.nvram
"Test.nvram" is locked in Exclusive mode by host having mac address ['00:XX:56:XX:11:XX']
Host owning the lock on file is <Hostname>, lockMode : Exclusive
Test.vmdk
Test.vmsd
Test.vmx


Normally, in the output, we will see the owner host, if you find a different host save the name of that host. 

To check all .<uuid>.lck files run the below command :
for file in .*lck; do echo ${file}; vmfsfilelockinfo -p ${file} |grep -i mode; done

To check all the files for VMs that have spaces in the name run the below command :
for file in *; do echo "${file}"; vmfsfilelockinfo -p "${file}" |grep -i mode; done





Additional Information

See the following KBs with respect to:

Committing snapshots when there are no snapshot entries in the Snapshot Manager
Investigating virtual machine file locks on ESXi


Impact/Risks:
VMs fail to power on.
Snapshots fail to delete or consolidate.
VM fails to clone or vMotion.