Virtual machine stops responding with the error: The lock protecting virtualdisk.vmdk has been lost

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:

A virtual machine that is in a powered-on state and running goes into an inconsistent state suddenly or unexpectedly powered off.
Below warnings are observed in /vmfs/volumes/datastore/VM/vmware.log

The lock protecting virtualdisk.vmdk has been lost. This is most likely due to underlying storage having problems, resulting in this virtual machine getting powered on at another ESX host as well. This virtual machine needs to be powered off at this host now. Kindly confirm that the virtual machine is running successfully on another host before clicking the OK button

OR

The lock protecting test.vmdk has been lost, possibly due to underlying storage issues. If this virtual machine is configured to be highly available, ensure that the virtual machine is running on some other host before clicking OK

When you click OK, the virtual machine shuts down.
This issue is observed on ESX hosts in HA-enabled clusters. It can also occur in non-HA setups when the datastores hosting virtual machines are shared across ESXi hosts.
In HA enabled clusters, the virtual machine gets powered-on on another host in the cluster.
Below state in doubt and driver aborts events are seen in /var/run/log/vmkwarning.log at the same time when there was lock lost warnings observed in /vmfs/volumes/datastore/VM/vmware.log

2025-09-03T12:53:34.357Z Wa(180) vmkwarning: cpu105:2097367)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:235: NMP device "naa.#########################" state in doubt; requested fast path state update...

2025-09-03T12:53:34.358Z In(182) vmkernel: cpu69:2098652)ScsiDeviceIO: 4605: Cmd(0x45dae6d89e00) 0x8a, cmdId.initiator=0x43100d4142c0 CmdSN 0x3a5 from world28176742 to dev "naa.naa.#########################" failed H:0x8 D:0x0 P:0x0 Cancelled from driver layer

2025-09-03T12:53:34.360Z In(182) vmkernel: cpu108:2097406)lpfc: lpfc_handle_status:5631: vmhba4 3271: FCP cmd x8a failed <6/4> sid #####, did #####, oxid ####iotag xc18 Abort Requested Host Abort Req

Environment

VMware ESXi 7.x
VMware ESXi 8.x

Cause

The most common reasons for the failure in updating the disk locks are intermittent SAN and network issues, such as unreachable storage or high latencies. I/O's may not be completing on time.

Lock was lost during the timedout heartbeat are seen in /var/run/log/vmkernel.log

2025-09-03T12:53:34.361Z In(182) vmkernel: cpu45:26119008)HBX: 5928: 'DS#': HB at offset #####- Cancelling all threads waiting for reclaim of HB:

2025-09-03T12:53:34.361Z In(182) vmkernel: cpu72:27655942)HBX: 3089: 'DS#': HB at offset #####- Waiting for timed out HB:

2025-09-03T13:27:35.776Z In(182) vmkernel: cpu64:27351814)DLX: 1773: vol 'DS#': 'Exclusive' lock at ##### was lost during a timedout HB.

Below similar panic events are seen in /vmfs/volumes/Datastore#/vmware.log.

2025-09-03T13:27:35.904Z In(05) vcpu-14 - MsgQuestion: msg.hbacommon.locklost reply=02025-09-03T13:27:35.904Z Cr(01) vcpu-14 - PANIC: Exiting because of failed disk operation.

If an ESXi host loses access to a datastore, I/O from running virtual machines on the datastore will time out and fail. The virtual machine pauses and an event message appears stating that the virtual machine lost access to its disk. This problem might occur in these situations:

The host's storage network connection is not restored within 15 seconds and another host breaks the disk lock.

This is the expected behavior because VMFS host clustered lock manager lets a host send I/O to resources such as virtual disk files only if the host owns the resource through a lock. This process is required so that guest data remains consistent with other hosts who might try to access the same data. For more information see, "Host Connectivity Degraded in ESXi" warning in vCenter Server.
The host is part of a VMware HA cluster and loses connectivity to its management and storage networks (Isolation Event).

In this case, VMware HA attempts to restart virtual machines on a 'healthier' host in the cluster. If HA is configured to leave virtual machines powered on when isolated, the virtual machines on the isolated host are failed over, but the original virtual machines remain running on the isolated host (without the VMDKlocks). When the isolated host rejoins the cluster, the duplicate virtual machines running on it fail to reacquire the disk locks and the event message appears.

A node that is isolated from the network needs time to release the virtual machine's VMFS locks if the host's isolation response is to fail over the virtual machines or to leave them powered on.

In HA enabled clusters, the virtual machine will be powered-on on another host. In non-HA clusters, you can power on the virtual machine after it is powered off.

Resolution

Engage storage vendor to further investigate the IO completion delays to avoid extended disk locking issues.

Additional Information

For related information, see: