Hostd[2099168] [Originator@6876 sub=IoTracker] In thread 2099146, fopen("/vmfs/volumes/########-########-####-############/<VM_name>/<VM_name>.vmx") took over 267 sec.Hostd[2099144] [Originator@6876 sub=IoTracker] In thread 2099176, fopen("/vmfs/volumes/########-########-####-############/<VM_name>/<VM_name>.vmx") took over 1885 sec.Hostd[2099179] [Originator@6876 sub=IoTracker] In thread 2099167, fopen("/vmfs/volumes/########-########-####-############/<VM_name>/<VM_name>.vmx") took over 1595 sec.
VMware vSphere ESXi 7.0.x
VMware vSphere ESXi 8.0.x
These symptoms may arise where there is a high degree of lock contention on the VMFS datastore.
Note that a degree of lock contention is normal and expected on ESXi hosts accessing shared storage. Lock contention becomes an issue where a host fails to acquire a specific datastore lock for an extended period due to the lock being held by another host.
Identify lock contention:
When a host accesses a lock at a specific offset, the lock version is incremented. If a lock has the same version on repeated attempts to access it, this means that lock has been held and not released for the duration of these attempts.
Logging of the following type will be seen om vmkernel.log:
<timestamp> cpu3:2315260)DLX: 4333: vol '<datastoreName>', lock at 248979456: [Req mode 1] Checking liveness:<timestamp> cpu3:2315260)[type 10c00001 offset 248979456 v 137952457, hb offset 3440640gen 371, mode 1, owner ########-########-####-############ mtime 50755437num 0 gblnum 0 gblgen 0 gblbrk 0]
<timestamp> cpu0:4227286)DLX: 4985: vol '<datastoreName>', lock at 20955136: [Req mode: 1] Not free:<timestamp> cpu0:4227286)[type 10c00002 offset 20955136 v 1672, hb offset 3735552gen 8065, mode 1, owner #######-########-####-############ mtime 34117143num 0 gblnum 0 gblgen 0 gblbrk 0] alloc owner 3735552
E.g. the following output lists number of attempts to access a lock (first column), the lock offset (second column), the lock version (third column):
grep -Ei -A1 "checking liveness|not free" vmkernel.log | grep offset | awk '{print $7,$9}'|sort | uniq -c | sort -r | less
57 240992438 76,
49 240955673 108,
47 160571391 46,
43 187776960 59,
32 190488438 99,
...
Note: logging may vary slightly between versions
Identify if one host is predominantly the holder of locks under contention, e.g.:
grep -Ei -A2 "checking liveness|not free" vmkernel.* | grep owner | awk '{print $9}'|sort | uniq -c | sort -r
4773 <ESXi host UUID_1>
1 <ESXi host UUID_2>
1 <ESXi host UUID_3>
1 <ESXi host UUID_4>
Place this host into maintenance mode and reboot the host.
If there is no one host which is causing the issue: