VMware vSphere ESXi 7.0
VMware vSphere ESXi 8.0
VMware vCenter Server 7.x
VMware vCenter Server 8..x
Backup workflow
- Backup software copies data, it will temporarily access the disk, leading to a read lock during the copy.
If backup still has active read on VM Disks and a consolidation request comes in, then consolidations will fail.
hostd.log
YYYY-MM-DDTHH:MM:SS.XXXZ info hostd[XXXXXXX] [Originator@XXXX sub=Libs opID=nbdmode-XXXXXXX*** ] Setting NFC log level to 0
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyCollector opID=nbdmode-XXXXXXX ***]
TriggerProcessGUReqs: Session ########-####-####-####-############
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyJournal opID=nbdmode-XXXXXXX ***] Notified a listener
on ha-host for path "dataTransportConnectionInfo"
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyProvider opID=nbdmode-XXXXXXX ***] RecordOpInt called
for dataTransportConnectionInfo.
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyProvider opID=nbdmode-XXXXXXX ***]
[RecordAndNotifyChange] Removing stored value for property dataTransportConnectionInfo
YYYY-MM-DDTHH:MM:SS.XXXZ verbose hostd[XXXXXXX] [Originator@XXXX sub=PropertyProvider opID=nbdmode-XXXXXXX ***] Recordop ASSIGN:
dataTransportConnectionInfo, ha-host. Sent notification immediately.
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyCollector opID=nbdmode-XXXXXXX ***] ProcessGUReqs
Start: Session ########-####-####-####-############
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=PropertyCollector opID=nbdmode-XXXXXXX ***] ProcessGUReqs End:
Session ########-####-####-####-############ (1 filter updates, 1 GUReqs)
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=DiskLib opID=nbdmode-XXXXXXX ***] DISKLIB-LIB : Removing
obsolete flag OPEN_BUFFERED.
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=DiskLib opID=nbdmode-XXXXXXX ***] DISKLIB-VMFS :
"/vmfs/volumes/datastore/vmname-flat.vmdk" : open successful (14) size = XXXXXXXXXXXX, hd = XXXXXXXXXX. Type 3
YYYY-MM-DDTHH:MM:SS.XXXZ trivia hostd[XXXXXXX] [Originator@XXXX sub=DiskLib opID=nbdmode-XXXXXXX ***] DISKLIB-LIB_BLOCKTRACK :
Resuming from change tracking info file /vmfs/volumes/datastore/vmname/vmname-ctk.vmdk.
There is an operation with opID nbdmode-XXXXXXX that opened the disk at YYYY-MM-DDTHH:MM:SS.XXXZ. The "(14)" is the disk open flags includes the flags for opening with a read only lock. The operation closes the disk at YYYY-MM-DDTHH:MM:SS.XXXZ. During this time, any consolidates will fail because of the read lock. ***
These logs had the session close because of a network error. Based on the need to restart hostd to workaround the issue, the session was never closed.
Increase Logging Level:
1. SSH into the host.
2. Increase hostd main log level and history:
$ vim-cmd hostsvc/advopt/update Config.HostAgent.log.level string trivia$ esxcli system syslog config logger set --id=hostd --rotate=20 --size=50000
3. Increase NFCLIB logging. From How to change configuration settings for hostd service in vSphere ESXi 7.0 Update 2 and later
3a. Dump hostd internal config to a file:$ /bin/configstorecli config current get -c esx -g services -k hostd -outfile tmp.json
3b. Open file in vi:
$ vi tmp.json
3c: Find the section titled "nfcsvc". Within it, find the line "log_level": "INFO". Change INFO to DEBUG and save.
3d: Update the hostd internal config with the edited file:$ /bin/configstorecli config current set -c esx -g services -k hostd -infile tmp.json
4: Restart hostd:
$ /etc/init.d/hostd restart
5. Verify all the above commands worked by running this:$ cat /var/log/hostd.log | grep -i "NFC connection accept"
If the above steps worked, there will be an output similar to below:
cat /var/log/hostd.log | grep -i "NFC connection accept"YYYY-MM-DDTHH:MM:SS.XXXZ info hostd[XXXXXX] [Originator@XXXX sub=Libs] [NFC VERBOSE]NFC connection accept timeout: 180000 milliseconds
In some cases, restarting hostd will not be enough to clear the file lock due to the backup proxy VM actively retaining the file lock(s). When this occurs, the issue needs to be addressed by restarting the backup service on the backup proxy VM or rebooting that VM. You can track it down by running vmfsfilelockinfo -p locked_vmfk_file.vmdk to identify the IP address of the machine holding the lock.
More information on this kind of investigation can be found here: https://knowledge.broadcom.com/external/article/314365/investigating-virtual-machine-file-locks.html