This information describes the symptoms, cause, impact, and resolution of the issue where Aria Operations for Logs nodes become disconnected due to high root partition utilization caused by a large .hprof file.
Symptoms:
df -h
" on the respective nodes we see that the root
partition is full..hprof
" java heap dump with a very large file size and growing.journalctl
logs you see log entries about NTP time drift.VMware Aria Operations for Logs 8.14.x and later
The heap dump is created when a Java process sees an issue with a running process. In this case, because a node suffers from time drift the loginsight service keeps crashing.
File(s) from the log
directory occupies unusually high disk space. This could be due to issues with log file rotation.
Find the hprof file using the below command:
find / -name \*.hprof -exec ls -lah {} \;
Then you can delete the .hprof file(s), if any and restart the loginsight service by running:
service loginsight restart
To ensure that this issue doesn't happen again ensure that NTP server is reachable from the node and that the time on all the nodes in the cluster has only a couple of seconds time difference.
If there are no heapdump files or if issue does not get resolved even after removing heapdump files, follow the steps below:
Go to root directory and run du -sh *
command to check which directory is occupying space and check if you can remove anything from that folder.
Example: In one case, var
directory was showing highest file size- in /var/log
we found that messages file was of unusually large size.
Ran "> messages
" command to truncate the file after which root
partition cleared up space.