This failure is seen when snapshot consolidation requested while the third party backup software is still actively holding one or more disks open.
You see the similar events recorded in the Host Management Agent logs (var/run/log/hostd.log)
Hostd[<WORLD_ID>]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/<NFS_VOLUME_UUID>/<VM_HOME_FOLDER>/<VM_NAME>.vmx] Handling vmx message nnnnnn: Locking conflict for file "/vmfs/volumes/<NFS_VOLUME_UUID>/<VM_HOME_FOLDER>/<VM_NAME>-000001-delta.vmdk". Kernel open flags are 0x8. Owner process on this host is world ID nnnnnn with world name vmx-vcpu-0:VM_NAME.Hostd[<WORLD_ID>]: --> Failed to lock the fileHostd[<WORLD_ID>]: --> Cannot open the disk '/vmfs/volumes/<NFS_VOLUME_UUID>/<VM_HOME_FOLDER>/<VM_NAME>-flat.vmdk' or one of the snapshot disks it depends on.Hostd[<WORLD_ID>]: --> An operation required the virtual machine to quiesce and the virtual machine was unable to continue running.Hostd[<WORLD_ID>]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event nnnn : Error message on VM_NAME on ESX_SERVER_FQDN in ha-datacenter: An operation required the virtual machine to quiesce and the virtual machine was unable to continue running.Hostd[<WORLD_ID>] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event nnnn : Virtual machine VM_NAME disks consolidation failed on ESX_SERVER_FQDN in cluster CLUSTER_NAME in ha-datacenter.
If the VM was restarted by HA, the following logs can be observed in /var/run/log/fdm.log
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107401] [Originator@6876 sub=Invt opID=WorkQueue-######] Vm /vmfs/volumes/###############/######/######.vmx changed guestHB=red
[YYYY-MM-DDTHH:MM:SS] In(166) Fdm[2107406] [Originator@6876 sub=Invt opID=WorkQueue-######] Vm /vmfs/volumes/###############/######/######.vmx curPwrState=powered on curPowerOnCount=1 newPwrState=powered off clnPwrOff=false hostReporting=__localhost__
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107406] [Originator@6876 sub=Invt opID=WorkQueue-######] Vm /vmfs/volumes/###############/######/######.vmx localhost: local power state=powered off; global power state=powered off
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107406] [Originator@6876 sub=Invt opID=WorkQueue-######] vm /vmfs/volumes/###############/######/######.vmx from __localhost__ changed inventory cleanPwrOff=0
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107406] [Originator@6876 sub=Invt opID=WorkQueue-######] Vm /vmfs/volumes/###############/######/######.vmx changed guestHB=gray
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107403] [Originator@6876 sub=Execution opID=host-######:##:########-#] Failing over vm /vmfs/volumes/###############/######/######.vmx (isRegistered=true)
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107403] [Originator@6876 sub=Execution opID=host-######:##:########-#] Registering vm done (vmid=/vmfs/volumes/###############/######/######.vmx, hostdVmId=)
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107403] [Originator@6876 sub=Execution opID=host-######:##:########-#] Reconfiguring vm
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107567] [Originator@6876 sub=Invt opID=WorkQueue-######] GuestKernelCrashed is false for VM 38
[YYYY-MM-DDTHH:MM:SS] Db(167) Fdm[2107567] [Originator@6876 sub=Invt opID=WorkQueue-######] VM 38: Updated GuestKernelCrashed!
Powering on the Failed VM
To be able to power on the VM, you need to get the locks on parent disk release by force closing the open. This can be done bi either,
Or
Permanent Fix: To avoid the VM failure patch the ESXi Servers to ESXi Server 8.0 Update 3e as it is fixed with this release.
Workaround:
There are per-VM and per-Host workarounds available to disable the NFS lock upgrading functionality that is causing this issue.
Per VM:
The workaround to disable the NFS lock upgrading functionality is setting a VM configuration "consolidate.upgradeNFS3Locks" to "FALSE". i.e. set the following in the VM configuration file
consolidate.upgradeNFS3Locks = "FALSE"
This would require powering off the VM, setting the config, and powering the VM back on. Follow Tips for editing a .vmx file to update the VM configuration file.
Per Host:
If the host wide config is set during maintenance mode, then this will happen automatically as VMs are migrated back to the host. Recommended steps for this would be:
- Put host in maintenance mode.
- SSH into the host and edit /etc/vmware/config to add the following line:
consolidate.upgradeNFS3Locks = "FALSE"
- Exit maintenance mode. From this point, once VMs migrate back, they will pickup the host level configuration.