Aria Operation for Logs shows disconnected and has 100% utilization on root partition
search cancel

Aria Operation for Logs shows disconnected and has 100% utilization on root partition

book

Article ID: 312243

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • This information describes the symptoms, cause, impact, and resolution of the issue where Aria Operations for Logs nodes become disconnected due to high root partition utilization caused by a large .hprof file.


Symptoms:
  • One or More nodes show disconnected in the UI.
  • On checking the output of "df -h" on the respective nodes we see the root partition is 100%.
  • On checking which file is filling up the partition we see a ".hprof" java heap dump with a very large file size and growing.
  • On checking journalctl logs you see log entries about NTP time drift.


Environment

VMware Aria Operations for Logs 8.14.x

Cause

  • The heap dump is created when a Java process sees an issue with a running process. In this case, because a node suffers from time drift the loginsight service keeps crashing.

Resolution

  • Find the hprof file using the below command:
    • find / -name \*.hprof -exec ls -lah {} \;
  • Then you can delete the .hprof file(s), if any and restart the loginsight service by running:
    • service loginsight restart
  • To ensure that this issue doesn't happen again ensure that NTP server is reachable from the node and that the time on all the nodes in the cluster has only a couple of seconds time difference.


Additional Information

Impact/Risks:
  • Aria Operations for Logs nodes become unavailable, which affects the collection and analysis of logs.
  • High disk utilization can lead to performance degradation and potential data loss.