The clusterAgent service may slowly fill up OSDATA with logs.
search cancel

The clusterAgent service may slowly fill up OSDATA with logs.

book

Article ID: 312111

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This issue has a non-obvious cause, since symptoms can show up after months of the inventory being in a seemingly stable state. Customers may otherwise be unaware that they are affected until OSDATA fills up and workloads are affected.


Symptoms:

The file "/var/run/log/clusterAgent.stderr" on ESX becomes extremely large, potentially causing OSDATA to run out of diskspace, locking up ESX services and VMs.


Environment

VMware vSphere ESXi 8.0.1
VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 8.0.2

Cause

When it encounters certain rare error conditions, a component within the clusterAgent ESX service may start periodically writing to "/var/run/log/clusterAgent.stderr". Since this isn't expected, the file is not rotated or otherwise monitored. If the state persists over several months, the file can expand to several gigabytes in size, filling up the OSDATA partition.

Resolution

VMware is aware of this issue and working to resolve this in a future release.

Workaround:
The affected file should be monitored and cleared if needed. This requires SSH access to hosts.
stat /var/run/log/clusterAgent.stderr
If the file is absent or less than 1MB, there is no need for concern. If it is large, it should be deleted periodically.
The following command can be used to clear out the file. It can be run either selectively on hosts where the problem is spotted, or unconditionally on all hosts.
LF=/var/run/log/clusterAgent.stderr ; test -f $LF && [ $(stat -c%s $LF) -gt 1000000 ] && (rm -f $LF ; /etc/init.d/clusterAgent restart)
If the problem is not detected, there will be no output. If the problem is detected, there will be messages indicating that the clusterAgent service has been restarted.