Snapshot consolidation does not complete though the task satus becomes completed.
search cancel

Snapshot consolidation does not complete though the task satus becomes completed.

book

Article ID: 372356

calendar_today

Updated On:

Products

VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0 VMware vSphere ESXi

Issue/Introduction

VM home page displays warning message "This virtual machine needs to have its disk consolidated". If the consolidations initiated, the task will be changed to completed state, however, the banner would still be present.

When a VM snapshot based backup solution is configured, the VM may hit maximum number of snapshots and displays below error

"This virtual machine has 255 or more redo logs in a single branch of its snapshot tree. The maximum supported limit has been reached, creating new snapshots will not be allowed. To create new snapshots, please delete old snapshots or consolidate the redo logs."

Environment

VMs with snapshot

Cause

Any of the snapshot tree delta disk may have been locked by a process other than vmx.

When a snapshot consolidation is attempted, the vmkernel will log the below warning.

 

2024-04-02T10:24:03.232Z In(182) vmkernel: cpu37:12224567 opID=70de03f1)FSS: 6818: Conflict between buffered and unbuffered open (file 'RHEL-DB-SVR-000004-delta.vmdk'):flags 0x4002, requested flags 0x8
2024-04-02T10:24:03.535Z In(182) vmkernel: cpu11:12224567 opID=70de03f1)FSS: 6818: Conflict between buffered and unbuffered open (file 'RHEL-DB-SVR-000004-delta.vmdk'):flags 0x4002, requested flags 0x8
2024-04-02T10:24:03.836Z In(182) vmkernel: cpu19:12224567 opID=70de03f1)FSS: 6818: Conflict between buffered and unbuffered open (file 'RHEL-DB-SVR-000004-delta.vmdk'):flags 0x4002, requested flags 0x8
2024-04-02T10:24:03.139Z In(182) vmkernel: cpu10:12224567 opID=70de03f1)FSS: 6818: Conflict between buffered and unbuffered open (file 'RHEL-DB-SVR-000004-delta.vmdk'):flags 0x4002, requested flags 0x8

 

The above failures indicate that snapshot consolidate operation is trying to open /vmfs/volumes/................./RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk in exclusive mode for writing (requested flags 0x8 mean FILEOPEN_EXCLUSIVE) while the file is already open in unbuffered readonly mode (flags 0x4002 mean FILEOPEN_DIRECT|FILEOPEN_READONLY)

 


2024-04-02T10:24:03.720Z In(166) Hostd[7668443]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 72387 : Warning message on RHEL-DB-SVR on host-abc.xxxx.com in ha-datacenter: This virtual machine has 255 or more redo logs in a single branch of its snapshot tree. The maximum supported limit has been reached, creating new snapshots will not be allowed. To create new snapshots, please delete old snapshots or consolidate the redo logs.
2024-04-02T10:24:03.736Z Db(167) Hostd[7668443]: [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR.vmx] Consolidate Disks translated error to vim.fault.FileLocked
2024-04-02T10:24:03.737Z Db(167) Hostd[7668443]: [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR.vmx] Consolidate Disks message: Consolidation failed for disk node 'scsi0:0': Failed to lock the file.
2024-04-02T10:24:03.737Z Db(167) Hostd[2099647]: --> Consolidation failed for disk node 'scsi0:1': Failed to lock the file.
2024-04-02T10:24:03.737Z Db(167) Hostd[2099647]: --> An error occurred while consolidating disks: Failed to lock the file.
2024-04-02T10:24:03.737Z Db(167) Hostd[2099647]: -->

Resolution

Determine the locks placed on the file using vmfsfilelockinfo command.

vmfsfilelockinfo -p  /vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk

You can refer to the KB https://knowledge.broadcom.com/external/article/314365/investigating-virtual-machine-file-locks.html to understand how to determine the locks.

 

If the vmfsfilelockinfo command returns more than one host that has locked the file, login to both hosts and determine what is holding the locks with the command "lsof"

 

[[email protected]:/vmfs/volumes/.........../RHEL-DB-SVR] lsof | grep -i RHEL-DB-SVR-000004-delta.vmdk
75643218    vmx                   FILE                      373   /vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk
3055148     hostd                 FILE                      202   /vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk
3055148     hostd                 FILE                      206   /dev/deltadisks/3344f3f6-RHEL-DB-SVR-000004-delta.vmdk

Apart from vmx process, there is another read-only lock held by hostd. VMX is the virtual machine's own process and it should be untouched.

Command "vmfsfilelockinfo -p  /vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk" may display only one host lock on the file as the vmx and hostd are from the same system. If the hostd is from a different host which is securing read-only lock, the this command will return the MAC address of both hosts.

hostd.logs may indicate which client is holding the lock depending upon how the file is being accessed.

2024-04-01T03:33:28.573Z In(166) Hostd[3055148]: [Originator@6876 sub=Libs opID=000000ab23d53e810] [NFC INFO]NfcServerProcessClientMsg: Authenticity of the NFC client verified. IP: X.X.X.X
2024-04-01T03:33:28.622Z In(166) Hostd[3055148]: [Originator@6876 sub=Libs opID=nbdmode-000000ab23d53e810]  [NFC INFO]NfcFile_Open: session=AB470722D0 hdl=AB33C4D220 Local filename = '/vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk'
2024-04-01T03:33:28.622Z In(166) Hostd[3055148]: [Originator@6876 sub=Libs opID=nbdmode-000000ab23d53e810]  [NFC INFO]Nfc_RegisterFileHandle: sessionId=AB470722D0 fh=AB33C4D220(/vmfs/volumes/.........../RHEL-DB-SVR/RHEL-DB-SVR-000004-delta.vmdk)

If the lock was secured a long ago and the logs are rolled over, the client IP can't be determined.

 

If the client that is locking file is located, login to the client and gracefully stop the process that is locking the file. If the client is not determined, try to restart the hostd

/etc/init.d/hostd restart

Warning: If the hostd is restarted without gracefully bringing down the process that is holding the lock, the process may be terminated abruptly. For eg, if a migration server is holding the lock and the hostd is restarted, the migration process may fail.

Restarting the hostd will not impact the VM's running state.