Aria Orchestrator Automation /var/log is 100% full

Products

VMware Aria Suite

Issue/Introduction

VMware Aria Orchestrator/Automation (vRA/vRO) appliance /var/log file system shows as 100% used.

Environment

VMware Aria Orchestrator/Automation 8.x

Cause

Two main scenarios that can cause the filling of the /var/log/ files system in are:

1. Improper configuration of log file rotation: If not configured properly, the log rotation can cause large-sized files and also numerous log files, which over time, can consume all the available space.
2. Creation of large Java Memory dump (.hprof ) files:
  - Aria Orchestrator/Automation can run into an OutOfMemory exception based upon configuration and system usage which will dump an hprof file to disk. This can subsequently cause a full drive.

Resolution

We can use the below methods to clear and reclaim space: (take snapshots of the appliance before executing the steps)

1. Identify Large Log Files:

- Start by identifying the largest files consuming space in /var/log:
  du -sh /var/log/* | sort -rh | head -20
- This command will list the largest files and directories, allowing you to pinpoint the culprits.

2. Clear or Compress Large Log Files:

- Delete old logs: If certain log files are no longer needed (e.g., older rotated logs), you can safely delete them:
  sudo rm -f /var/log/<logfile>.gz
- Truncate large log files: For logs that are currently being written to, you can truncate the file to clear its contents without deleting it:
  sudo truncate -s 0 /var/log/<logfile>

3. Use Log Rotation:

- Ensure log rotation is working: Log files should be rotated and compressed automatically by logrotate. Check the configuration in /etc/logrotate.conf or /etc/logrotate.d/.
- If necessary, manually trigger log rotation:
  sudo logrotate -f /etc/logrotate.conf

4. Examine Specific VMware Logs:

- Aria Orchestrator and vRA logs: These logs may include large files. You can check:
  du -sh /var/log/vmware/* | sort -rh | head -20
- vRO specific logs: Aria Orchestrator logs are typically found under /var/log/vmware/vco/ Clean unnecessary logs from there.
- If /var/log/journal is large, check step 8

5. Remove Old Core Dumps or Unused Packages:

- Check for core dumps or unused package files that may be consuming space:
  
  sudo find /var/crash -type f -exec rm -f {} \;
  sudo apt-get clean # For Ubuntu-based systems
  sudo yum clean all # For RHEL/CentOS-based systems

6. Check for Orphaned Docker Images (Optional):

- check for unused Docker images, containers, and volumes that may consume space:
  docker system prune -a

7. Delete /service-logs/prelude/vco-app/file-logs/vco.hprof heap dump files and write a script file to delete the vco.hprof files periodically.

Note: hprof files are heap dumps in binary format, which can be used for detailed analysis of memory-related problems in the Java stack. These are safe to delete and are not actively in use by a healthy running system.
Note: A cron job can be created to run on a schedule to check for large *hprof files on the system to be removed.

8. Retain only the last X MB of journal logs with journalctl --vacuum-size=XMB , e.g. to retain only the last 200MB of journal logs: journalctl --vacuum-size=200MB

Note: you may also opt to retain the last X days of logs with: journalctl --vacuum-time=Xd, e.g. to retain the last 2 days of journal logs: journalctl --vacuum-time=2d