In certain scenarios, the Aria/VCF Operations cluster may enter a "Running/Waiting for Analytics" state and fail to come online fully. Upon investigation, this issue was traced back to the root partition being completely filled, preventing cluster to come Online.
Environment
Aria Operations 8.x VCF Operations 9.x
Cause
Upon running the df -h command, it was found that the root (/) partition was 100% full. To identify which directories were consuming excessive space, the command du -sh * was executed within the root partition.
This revealed that the /var/tmp/alerts directory was rapidly growing in size. Further investigation showed that a Log File plugin had been configured in Outbound Settings within Aria/VCF Operations, which was generating large volumes of alert logs stored under /var/tmp/alerts.
Resolution
To resolve the issue and restore cluster functionality, the following steps were taken:
Free Up Root Partition Space:
SSH to the affected Analytics node
Move the /var/tmp/alerts folder to a location with more available disk space using:
mv /var/tmp/alerts /storage/db/
Restart Cluster:
Log into the Admin UI of the primary node as admin (https://<Primary_Node_FQDN>/admin)
Bring the cluster offline.
Once Offline, bring the cluster online
Update Log File Plugin Configuration:
Log into the Product UI as admin (https://<Primary_Node_FQDN>/ui)
In the Left Panel, navigate to Configuration > Outbound Settings.
Edit the respective Log File plugin configuration to store logs in a new directory:
/storage/db/alerts
This ensures that future alerts will be stored in a directory with adequate space, preventing recurrence of this issue.