ESXi hosts are not responding in vCenter due to failing to allocate objectCacheHeap.
After trying to restart hostd process, the host becomes unresponsive again.
VMware vSphere ESXi 7.x
An Object Cache (OC) exist on filesystems l(such as VMFS, NFS, DevFS) and stores cache data for all opened objects.
As long as the file or volume is being accessed, the corresponding OC object remains referenced. Once all references to the file or volume are closed, the related OC object is supposed to be flushed from the cache. However, in this case, the OC entries fail to be flushed for some reason.
Issue is resolved in vSphere ESX 7.0 U3i build 20842708
0x430b66800000 will be different./system/heaps/objectCache-0x430b66800000/> cat statsHeap stats { Name:objectCache owning module id:0 dynamically growable:1 physical contiguity: 1 -> Any Physical Contiguity lower memory PA limit:0 upper memory PA limit:-1 may use reserved memory:0 memory pool:228 # of ranges allocated:1 dlmalloc overhead:1032 current heap size:34099624 initial heap size:131072 current bytes allocated:34074808 current bytes available:24816 current bytes releasable:288 percent free of current size:0 percent releasable of current size:0 maximum heap size:34099624 maximum bytes available:24816 percent free of max size:0 lowest percent free of max size ever encountered:0 # of failure messages:0 number of succeeded allocations:927979155 number of failed allocations:680826 <-- Here number of freed allocations:927913628 average size of an allocation:98304 number of requests we try to satisfy per heap growth:2 number of heap growth operations:53068 number of heap shrink operations:20687 Size of the physical pages backing this heap.:4096}
Workaround:
/var/log/vmkernel.log where 'TEST' is the name of an affected datastore.$ grep -B3 -i evict vmkernel.log
[YYYY-MM-DDTHH:MM:SS] cpu67:2097774)Res3: 2572: Failed to lock cluster 17 (typeID 6) after 10 tries, aborting: caller 0x4200086850f4 vol TEST
[YYYY-MM-DDTHH:MM:SS] cpu67:2097774)WARNING: Vol3: 2848: 'TEST': Failed to clear journal address since JBC could not be Locked. This could result in leak of journal block at <type 6 addr 33554449>.
[YYYY-MM-DDTHH:MM:SS] cpu67:2097774)WARNING: Vol3: 2916: 'TEST': Failed to clear journal address in on-disk HB. This could result in leak of journal block at <type 6 addr 33554449>.
[YYYY-MM-DDTHH:MM:SS] cpu67:2097774)Vol3: 4191: Error closing the volume: Failure. Eviction fails.
Using an SSH session, change the directory to
/vmfs/volumes/TEST .The SSH session accessing this directory will then keep it open and workaround the issue.