Virtual Machine shuts down after a disk consolidation due to a locked file
search cancel

Virtual Machine shuts down after a disk consolidation due to a locked file

book

Article ID: 381876

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A virtual machine shut down is observed in the hostd log on ESXi:

Log location: /var/log/run/hostd.log

<YYYY-MM-DD>T<time>Z verbose hostd[2102782] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:<UUID>/<VMname>.vmx opID=lqntbmzn-19795825-auto-bsak2-h5:71916833-1b-01-75-85e9] Handling vmx message 96789784: Failed to lock the file
--> Cannot open the disk 'vm:/vmfs/volumes/<UUID>/<VMname>.vmdk' or one of the snapshot disks it depends on.
--> An operation required the virtual machine to quiesce and the virtual machine was unable to continue running

The vmware.log for the VM and hostd.log on ESXi show a NFC open (nbdmode) occurring while the VM is attempting to lock the base disk. 

Log Location: /var/log/run/hostd.log

2024-11-03T10:15:44.448Z info hostd[2103955] [Originator@6876 sub=Libs opID=nbdmode-000000c04c570270] OBJLIB-VSANOBJ: VsanObjOpenInt: Device '/vmfs/devices/vsan/<object>' not present, retrying to create one.
2024-11-03T10:15:44.456Z info hostd[2103955] [Originator@6876 sub=DiskLib opID=nbdmode-000000c04c570270] DISKLIB-LINK  :DiskLinkGetSpaceUsedInfo: Failed to get the file system unique id for file vsan://<volume>/<object>, using id as "remote"

Log Location: /vmfs/volumes/<VM's volume>/vmware.log

Note: To find the VM's logging location use Locating virtual machine log files on an ESXi host

2024-11-01T10:15:44.413Z In(05) vcpu-0 - Closing disk 'scsi0:1'

...

2024-11-01T10:15:44.413Z In(05) vcpu-0 - DISKLIB-LIB   : DiskLibUnlinkInt: Disk delete successfully completed { result:0, Msg: 'The operation completed successfully', fileName:'/vmfs/volumes/vsan:<UUID>/<VM Name>-000001.vmdk'}
2024-11-03T10:15:44.413Z In(05) vcpu-0 - Closing disk 'scsi0:2'

**NFC open happens at this time. scsi0:1 disk is still closed. At this point, the scsi0:1 delta is deleted and it points to the base disk.

2024-11-03T10:15:44.497Z In(05) vcpu-0 - DISK: OPEN scsi0:1 'vmfs/volumes/vsan:<UUID>/<VM Name>_1.vmdk' persistent R[]
2024-11-03T10:15:44.498Z Er(02) vcpu-0 - OBJLIB-VSANOBJ: VsanObjLock: Object '<object>' locked, retrying.

Note: The object in the hostd logs will match the object locked in the VM's logs. 

Environment

This issue can occur on all versions of ESXi.

Cause

The issue is caused due to timing issues with backup vendor API calls. NBD mode is a transport mode that a backup software can use to access the disk in a snapshot to copy the data for backup. Generally, this is used between the snapshot take and snapshot delete + consolidate. 

It is rare but possible for a nbdmode open to come in after a consolidate succeeded but before the VM resumed. This causes a read lock on the base disk that the VM requires to lock (exclusively) for the VM to continue running. 

Resolution

There is no resolution to this issue. A VM requires a small switch over window after a snapshot consolidation from the previous disk to the base disk. If a lock is acquired by something else during this window, the VM cannot continue running. 

It is recommended to engage the backup vendor to evaluate the timing of access calls to the disk vs snapshot operations. It is expected that nbdmode ops come in after a snapshot create and are closed before a snapshot delete operation.