In VCF Operations 9.x, you observe the following symptoms:
An active alert: One or more VMware Cloud Foundation Operations services on a node are down persists on a cluster node.
The Admin UI shows the cluster and nodes as "Online," but the alert does not clear.
Restarting the cluster (Offline/Online) via the Admin UI does not resolve the issue.
The VCF Operations Analytics-<node> 'Up Status' metric for the affected node is empty (no data) or "0".
In /data/vcops/log/analytics-wrapper.log, the service restarts multiple time a day but recovers on its own:
>>> Analytics is going offline...
In /data/vcops/log/java_error.log, a fatal error is recorded:
# SIGSEGV (0xb) at pc=0x################, pid=####, tid=#####
journalctl -xeu analytics.service as the root account from an SSH session on the node shows:analytics.service: Failed with result 'exit-code'.
Subject: Unit failed
Defined-By: systemd
The unit analytics.service has entered the "failed" state with result 'exit-code'
VCF Operations 9.0.x
This issue may be caused by resource contention and service crashes resulting from object exhaustion. In environments with high churn (such as Kubernetes), deleted container objects may accumulate in the database without being properly purged. A high volume of these orphaned objects (e.g., 40,000+) leads to database instability, causing the Analytics service to crash repeatedly (SIGSEGV)
To resolve this issue, you must perform database maintenance to purge the orphaned objects.
Contact Broadcom Support and reference this KB so the specific database cleanup steps can safely be completed
Once the cleanup is complete, the Analytics service will stabilize and the alert should clear