The clusterAgent service may slowly fill up OSDATA with logs causing host to lock up and all processes become unresponsive
search cancel

The clusterAgent service may slowly fill up OSDATA with logs causing host to lock up and all processes become unresponsive

book

Article ID: 312111

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This issue has a non-obvious cause, since symptoms can show up after months of the inventory being in a seemingly stable state. Customers may otherwise be unaware that they are affected until OSDATA fills up and workloads are affected.


Symptoms:

The file "/var/run/log/clusterAgent.stderr" on ESX becomes extremely large, potentially causing OSDATA to run out of diskspace, locking up ESX services and VMs.

Environment

VMware vSphere ESXi 8.0.1
VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 8.0.2

Cause

When it encounters certain rare error conditions, a component within the clusterAgent ESX service may start periodically writing to "/var/run/log/clusterAgent.stderr". Since this isn't expected, the file is not rotated or otherwise monitored. If the state persists over several months, the file can expand to several gigabytes in size, filling up the OSDATA partition.

Resolution

This issue was fixed in ESXi 8.0 Update 2b.

VMware ESXi 8.0 Update 2b Release Notes

Workaround:
The affected file should be monitored and cleared if needed. This requires SSH access to hosts.
stat /var/run/log/clusterAgent.stderr
If the file is absent or less than 1MB, there is no need for concern. If it is large, it should be deleted periodically.
The following command can be used to clear out the file. It can be run either selectively on hosts where the problem is spotted, or unconditionally on all hosts.
LF=/var/run/log/clusterAgent.stderr ; test -f $LF && [ $(stat -c%s $LF) -gt 1000000 ] && (rm -f $LF ; /etc/init.d/clusterAgent restart)
If the problem is not detected, there will be no output. If the problem is detected, there will be messages indicating that the clusterAgent service has been restarted.