Aria Operation for Logs shows disconnected and has 100% utilization on root partition
book
Article ID: 312243
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
This information describes the symptoms, cause, impact, and resolution of the issue where Aria Operations for Logs nodes become disconnected due to high root partition utilization caused by a large .hprof file.
Symptoms:
One or More nodes show disconnected in the UI.
On checking the output of "df -h" on the respective nodes we see that the root partition is full.
On checking which file is filling up the partition we see a ".hprof" java heap dump with a very large file size and growing.
On checking journalctl logs you see log entries about NTP time drift.
Environment
VMware Aria Operations for Logs 8.14.x and later
Cause
The heap dump is created when a Java process sees an issue with a running process. In this case, because a node suffers from time drift the loginsight service keeps crashing.
Resolution
Find the hprof file using the below command:
find / -name \*.hprof -exec ls -lah {} \;
Then you can delete the .hprof file(s), if any and restart the loginsight service by running:
service loginsight restart
To ensure that this issue doesn't happen again ensure that NTP server is reachable from the node and that the time on all the nodes in the cluster has only a couple of seconds time difference.
Additional Information
Impact/Risks:
Aria Operations for Logs nodes become unavailable, which affects the collection and analysis of logs.
High disk utilization can lead to performance degradation and potential data loss.