This issue has a non-obvious cause, since symptoms can show up after months of the inventory being in a seemingly stable state. Customers may otherwise be unaware that they are affected until OSDATA fills up and workloads are affected.
The file "/var/run/log/clusterAgent.stderr" on ESX becomes extremely large, potentially causing OSDATA to run out of diskspace, locking up ESX services and VMs.
This issue was fixed in ESXi 8.0 Update 2b.
VMware ESXi 8.0 Update 2b Release Notes
Workaround:
The affected file should be monitored and cleared if needed. This requires SSH access to hosts.
stat /var/run/log/clusterAgent.stderr
If the file is absent or less than 1MB, there is no need for concern. If it is large, it should be deleted periodically.
The following command can be used to clear out the file. It can be run either selectively on hosts where the problem is spotted, or unconditionally on all hosts.
LF=/var/run/log/clusterAgent.stderr ; test -f $LF && [ $(stat -c%s $LF) -gt 1000000 ] && (rm -f $LF ; /etc/init.d/clusterAgent restart)
If the problem is not detected, there will be no output. If the problem is detected, there will be messages indicating that the clusterAgent service has been restarted.