Aria Operation for Logs shows disconnected and has 100% utilization on root partition
search cancel

Aria Operation for Logs shows disconnected and has 100% utilization on root partition

book

Article ID: 312243

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

This information describes the symptoms, cause, impact, and resolution of the issue where Aria Operations for Logs nodes become disconnected due to high root partition utilization caused by a large .hprof file.

Symptoms:

  • One or More nodes show disconnected in the UI.
  • Login to the UI fails 
  • UI becomes inaccessible after reboot 
  • On checking the output of "df -h" on the respective nodes we see that the root partition is full.
  • On checking which file is filling up the partition we see a ".hprof" java heap dump with a very large file size and growing.
  • On checking journalctl logs you see log entries about NTP time drift.

Environment

VMware Aria Operations for Logs 8.14.x and later

Cause

  • The heap dump is created when a Java process sees an issue with a running process. In this case, because a node suffers from time drift the loginsight service keeps crashing.

  • File(s) from the log directory occupies unusually high disk space. This could be due to issues with log file rotation.

Resolution

  • Find the hprof file using the below command:

    • find / -name \*.hprof -exec ls -lah {} \;

  • Then you can delete the .hprof file(s), if any and restart the loginsight service by running:

    • service loginsight restart

  • To ensure that this issue doesn't happen again ensure that NTP server is reachable from the node and that the time on all the nodes in the cluster has only a couple of seconds time difference.

If there are no heapdump files or if issue does not get resolved even after removing heapdump files, follow the steps below:

Go to root directory and run du -sh * command to check which directory is occupying space and check if you can remove anything from that folder.

Example: In one case, var directory was showing highest file size- in /var/log we found that messages file was of unusually large size.

Ran "> messages" command to truncate the file after which root partition cleared up space.

Additional Information

Impact/Risks:
  • Aria Operations for Logs nodes become unavailable, which affects the collection and analysis of logs.
  • High disk utilization can lead to performance degradation and potential data loss.