ESXi hosts are not responding in vCenter due to failing to allocate objectCacheHeap
.
After trying to restart hostd
process, the host becomes unresponsive again.
VMware vSphere ESXi 7.x
An Object Cache (OC) exist on filesystems l(such as VMFS, NFS, DevFS) and stores cache data for all opened objects.
As long as the file or volume is being accessed, the corresponding OC object remains referenced. Once all references to the file or volume are closed, the related OC object is supposed to be flushed from the cache. However, in this case, the OC entries fail to be flushed for some reason.
Issue is resolved in vSphere ESX 7.0 U3i build 20842708
0x430b66800000
will be different./system/heaps/objectCache-0x430b66800000/> cat stats
Heap stats {
Name:objectCache
owning module id:0
dynamically growable:1
physical contiguity: 1 -> Any Physical Contiguity
lower memory PA limit:0
upper memory PA limit:-1
may use reserved memory:0
memory pool:228
# of ranges allocated:1
dlmalloc overhead:1032
current heap size:34099624
initial heap size:131072
current bytes allocated:34074808
current bytes available:24816
current bytes releasable:288
percent free of current size:0
percent releasable of current size:0
maximum heap size:34099624
maximum bytes available:24816
percent free of max size:0
lowest percent free of max size ever encountered:0
# of failure messages:0
number of succeeded allocations:927979155
number of failed allocations:680826 <-- Here
number of freed allocations:927913628
average size of an allocation:98304
number of requests we try to satisfy per heap growth:2
number of heap growth operations:53068
number of heap shrink operations:20687
Size of the physical pages backing this heap.:4096
}
Workaround:
/var/log/vmkernel.log
where 'TEST' is the name of an affected datastore.$ grep -B3 -i evict vmkernel.log
2022-10-06T22:09:37.898Z cpu67:2097774)Res3: 2572: Failed to lock cluster 17 (typeID 6) after 10 tries, aborting: caller 0x4200086850f4 vol TEST
2022-10-06T22:09:37.898Z cpu67:2097774)WARNING: Vol3: 2848: 'TEST': Failed to clear journal address since JBC could not be Locked. This could result in leak of journal block at <type 6 addr 33554449>.
2022-10-06T22:09:37.898Z cpu67:2097774)WARNING: Vol3: 2916: 'TEST': Failed to clear journal address in on-disk HB. This could result in leak of journal block at <type 6 addr 33554449>.
2022-10-06T22:09:37.898Z cpu67:2097774)Vol3: 4191: Error closing the volume: Failure. Eviction fails.
/vmfs/volumes/TEST
.The SSH session accessing this directory will then keep it open and workaround the issue.