Analytics down - Aria Operations
search cancel

Analytics down - Aria Operations

book

Article ID: 403206

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Aria Operations UI is down
  • Aria Operations Admin UI shows nodes are "Waiting for Analytics"
  • There are no storage utilization issues on the cluster as per the Resolution section of Troubleshooting Storage Issues in Aria Operations
  • The following messages are present in file: /storage/log/vcops/log/analytics-wrapper.log

    2025/05/26 11:09:36 | INFO   | jvm 1    | java.lang.OutOfMemoryError: Java heap space

    2025/05/26 11:09:36 | INFO   | jvm 1    | Dumping heap to /storage/db/vcops/heapdump/java_pid#####.hprof ...

  • Heap dump files have been recently created and can be found in the directory below:

    /storage/db/vcops/heapdump/

Environment

Aria Operations 8.18.3

Cause

High number of guest filesystem instanced metrics stored per object, ultimately resulting in the Java 'OutOfMemory' error.

Resolution

  1. Run the following command on the Primary node:

    su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 -c \"select * from metric_key order by metric_id \"" > /storage/db/metric_keys.txt


  2. Get the total number of metric keys with the following command:

    cat /storage/db/metric_keys.txt | wc -l

  3. Get the total of Kubernetes related filesystem metric keys using the following command:

    grep "var/lib/kubelet/pods/" /storage/db/metric_keys.txt | wc -l


Compare the number of Kubernetes related filesystem keys with the total number of keys.
If the number of Kubernetes related filesystem keys makes up the majority of the total keys, continue with the steps below to mitigate the issue:

  1. Open the Aria Operations UI and browse to Operations >> Configurations >> Policy Definition
  2. Select the default policy (Priority = D)
  3. Click "EDIT POLICY"
  4. Click "Metrics and Properties"
  5. Select Object Type = "Virtual Machine"
  6. Expand "Metrics" and then "Guest File System"
  7. For the metrics:

    Guest File System:/|Partition Utilization (%),
    Guest File System:/|Partition Utilization (GB),
    Guest File System:/|Partition Capacity (GB)


    Select these metrics one at a time and click on the blue Activated in the Instanced State column.
    In the pop out window, set the Collect toggle to off and click Save. (Instanced State now shows as "Deactivated")

  8. Save the Policy
  9. Restart the cluster from Admin UI as per Rebooting nodes in Aria Operations

 

 

Additional Information

Note: Upon turning off collection of the mentioned metrics as per the Resolution, it will not be able to see breakdown per Partition, the data will be represented by corresponding aggregated metrics.

If it is required to collect metrics for specific partitions, they can be specified as per the configuration screenshot below: