Extending the retention period of diagnostic logs (clomd, vsantrace) in vSAN environments

Products

VMware vSAN

Issue/Introduction

When an event such as a vSphere HA failover occurs, default log settings may cause clomd and vsantrace logs to rotate within a short period (typically within hours to less than a day). This can result in the loss of critical information required for Root Cause Analysis (RCA).

Symptoms include:

If log collection is delayed for several days after an event, the logs from the time of the event are already overwritten.
The high frequency of log output outpaces the default retention settings (typically 8 rotations), which is insufficient to cover the necessary troubleshooting timeframe.

This article provides steps to increase the log file size and rotation count to preserve diagnostic data for future occurrences.

Environment

VMware vSAN 7.0.x
VMware vSAN 8.0.x

Cause

The default log rotation settings (file size and generation count) are insufficient relative to the system's workload and high log generation rate.

Resolution

To prepare for future occurrences, implement the following configuration changes on each ESXi host:

1. Extend clomd log retention (syslog configuration)

The clomd log is managed by the ESXi syslog framework.

Verify the current configuration:

esxcli system syslog config logger list | grep -A4 "clomd.log"
Example Output:
   Destination: clomd.log
   ID: clomd
   Rotation Size: 1024
   Rotations: 8

Increase the rotation count and file size (e.g., 1024 KB, 10 rotations):

esxcli system syslog config logger set --id=clomd --size=1024 --rotate=10

Reload the syslog service to apply changes:
```
esxcli system syslog reload
```

2. Relocate and extend vsantrace logs

vsantrace is managed by a framework separate from syslog. By default, traces are stored in the RAM disk (/vsantraces).
To increase the number of retained files without causing memory exhaustion, it is strongly recommended to redirect the output to a local persistent datastore.

Create a destination directory (Avoid using the vSAN datastore):
```
mkdir /vmfs/volumes/[DatastoreName]/vsantraces_extended/
```
Modify the path and rotation counts (e.g., multiplying the default rotations by 8):
```
esxcli vsan trace set -p /vmfs/volumes/[DatastoreName]/vsantraces_extended/ -f 64 -d 32 --lsom-verbose-num-files=64
```
- -f: Number of base trace files to rotate
- -d: Number of DOM object trace files to rotate
- --lsom-verbose-num-files: Number of LSOM verbose trace files to rotate

Verify that the new settings have been applied:

esxcli vsan trace get

Example Output: 
   VSAN Traces Directory: /vmfs/volumes/localdisk01/vsantraces_extended/
   Number Of Files To Rotate: 64
   Maximum Trace File Size: 45 MB
   Log Urgent Traces To Syslog: true
   Number of DOM Trace Files To Rotate: 32
   Maximum DOM Trace File Size: 10 MB
   Number of LSOM Trace Files To Rotate: 8
   Maximum LSOM Trace File Size: 22 MB
   Number of LSOM Verbose Trace Files To Rotate: 64
   Maximum LSOM Verbose Trace File Size: 22 MB
   Number Of PLOG Trace Files To Rotate: 8
   Maximum PLOG Trace File Size: 22 MB

3. Monitor log retention period

Observe the retention period of the generated log files over time and adjust the rotation parameters according to your actual environment's requirements.

4. Persist the configuration

Run the following command to ensure the changes persist across host reboots:

/sbin/auto-backup.sh

NOTE

Ensure that the destination datastore has sufficient free capacity before increasing log sizes and rotations.
Extremely high vsantrace rotation limits will significantly inflate the size of the ESXi log bundle (vm-support).
To prevent log loss due to log rotation, it is highly recommended to collect logs as soon as possible following an event.