This information describes the symptoms, cause, impact, and resolution of the issue where Aria Operations for Logs nodes become disconnected due to high root partition utilization caused by a large .hprof file.
Symptoms:
FAILED: Unable to get user data. Possible cassandra is downdf -h" on the respective nodes we see that the root partition is full..hprof" java heap dump with a very large file size and growing.journalctl logs you see log entries about NTP time drift. passwd: Authentication token manipulation error passwd: password unchanged
/dev/sda5 is at 100%
VMware Aria Operations for Logs 8.14.x and later
The heap dump is created when a Java process sees an issue with a running process. In this case, because a node suffers from time drift the loginsight service keeps crashing.
File(s) from the log directory occupies unusually high disk space. This could be due to issues with log file rotation.
Find the hprof file using the below command:
find / -name \*.hprof -exec ls -lah {} \;
NOTE: The command may appear to hang for up to a couple minutes or more but it should eventually complete with any results found. The hprof files will normally be in the /usr/lib/loginsight directory.
Optional: You can copy .hprof files to your local machine before deleting them.
Then you can delete the .hprof file(s), if any, and restart the loginsight service by running
service loginsight restart
To ensure that this issue doesn't happen again ensure that NTP server is reachable from the node and that the time on all the nodes in the cluster has only a couple of seconds time difference.
If there are no heapdump files or if issue does not get resolved even after removing heapdump files, follow the steps below:
Go to root directory and run du -sh * command to check which directory is occupying space and check if you can remove anything from that folder.
Example: In one case, var directory was showing highest file size- in /var/log we found that messages file was of unusually large size.
Ran "> messages" command to truncate the file after which root partition cleared up space.