Aria Orchestrator Automation /var/log is 100% full
book
Article ID: 378276
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
VMware Aria Orchestrator/Automation (vRA/vRO) appliance /var/log file system shows as 100% used.
Slowness observed while accessing Aria Automation/Orchestrator UI.
After performing a deploy.sh to restart the service pods, the UI to the appliance could also be fully unavailable and you may receive “bad gateway” up on trying to load the UI.
When running command kubectl get pods -n prelude, no service pods are started. root@<vRA FQDN> [ ~ ]# kubectl get pods -n prelude
NAME READY STATUS RESTARTS AGE no-license-app-<ID#> 0/1 ContainerCreating 0 9m24s postgres-0 0/1 Init:0/1 0 9m23s rabbitmq-ha-0 0/1 Init:0/3 0 9m23s
The journalctl/systemd.journal logs will show an error similar to: <vRA FQDN> kubelet[Kubelet ID]: Create pod log directory for pod "no-license-app-<UUID>" failed: mkdir /var/log/pods/prelude_no-license-app-<UUID>: no space left on device
Environment
VMware Aria Orchestrator/Automation 8.x
Cause
Two main scenarios that can cause the filling of the /var/log/ files system in are:
Improper configuration of log file rotation: If not configured properly, the log rotation can cause large-sized files and also numerous log files, which over time, can consume all the available space.
Creation of large Java Memory dump (.hprof ) files:
Aria Orchestrator/Automation can run into an OutOfMemory exception based upon configuration and system usage which will dump an hprof file to disk. This can subsequently cause a full drive.
At least 20% of available disk space is required on all partitions. If any partition is over 80% utilized, find the files that are taking up the disk space.
Resolution
We can use the below methods to clear and reclaim space: (take snapshots of the appliance before executing the steps)
1. Identify Large Log Files:
Start by identifying the largest files consuming space in /var/log: du -sh /var/log/* | sort -rh | head -20
This command will list the largest files and directories, allowing you to pinpoint the culprits.
2. Clear or Compress Large Log Files:
Delete old logs: If certain log files are no longer needed (e.g., older rotated logs, older log bundles), you can safely delete them: sudo rm -f /var/log/<logfile>.gz
Truncate large log files: For logs that are currently being written to, you can truncate the file to clear its contents without deleting it: sudo truncate -s 0 /var/log/<logfile>
3. Use Log Rotation:
Ensure log rotation is working: Log files should be rotated and compressed automatically by logrotate. Check the configuration in /etc/logrotate.conf or /etc/logrotate.d/.
If necessary, manually trigger log rotation: sudo logrotate -f /etc/logrotate.conf
4. Examine Specific VMware Logs:
Aria Orchestrator and vRA logs: These logs may include large files. You can check: du -sh /var/log/vmware/* | sort -rh | head -20
vRO specific logs: Aria Orchestrator logs are typically found under /var/log/vmware/vco/ Clean unnecessary logs from there.
If /var/log/journal is large, check step 8
5. Remove Old Core Dumps or Unused Packages:
Check for core dumps or unused package files that may be consuming space:
sudo find /var/crash -type f -exec rm -f {} \; sudo apt-get clean # For Ubuntu-based systems sudo yum clean all # For RHEL/CentOS-based systems
6. Check for Orphaned Docker Images (Optional):
check for unused Docker images, containers, and volumes that may consume space: docker system prune -a
7. Delete /service-logs/prelude/vco-app/file-logs/vco.hprof heap dump files and write a script file to delete the vco.hprof files periodically.
Note: hprof files are heap dumps in binary format, which can be used for detailed analysis of memory-related problems in the Java stack. These are safe to delete and are not actively in use by a healthy running system. Note: A cron job can be created to run on a schedule to check for large *hprof files on the system to be removed.
8. Retain only the last X MB of journal logs with journalctl --vacuum-size=XMB , e.g. to retain only the last 200MB of journal logs: journalctl --vacuum-size=200MB
Note: you may also opt to retain the last X days of logs with: journalctl --vacuum-time=Xd, e.g. to retain the last 2 days of journal logs: journalctl --vacuum-time=2d