Alert: "One or more VMware Cloud Foundation Operations services on a node are down"
search cancel

Alert: "One or more VMware Cloud Foundation Operations services on a node are down"

book

Article ID: 427703

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

In VCF Operations 9.x, you observe the following symptoms:

  • An active alert: One or more VMware Cloud Foundation Operations services on a node are down persists on a cluster node.

  • The Admin UI shows the cluster and nodes as "Online," but the alert does not clear.

  • Restarting the cluster (Offline/Online) via the Admin UI does not resolve the issue.

  • The VCF Operations Analytics-<node> 'Up Status' metric for the affected node is empty (no data) or "0".

  • When cancelling the alert, it is generated again immediately or shortly after cancelling
  • In /data/vcops/log/analytics-wrapper.log, the service restarts multiple time a day but recovers on its own:

     
    >>> Analytics is going offline...

     

  • In /data/vcops/log/java_error.log, a fatal error is recorded:

    # A fatal error has been detected by the Java Runtime Environment:

    #

    # SIGSEGV (0xb) at pc=0x################, pid=####, tid=#####

     

  • Running journalctl -xeu analytics.service as the root account from an SSH session on the node shows:

    analytics.service: Failed with result 'exit-code'.

    Subject: Unit failed

    Defined-By: systemd

    The unit analytics.service has entered the "failed" state with result 'exit-code'

Environment

VCF Operations 9.0.x

Cause

This issue may be caused by resource contention and service crashes resulting from object exhaustion. In environments with high churn (such as Kubernetes), deleted container objects may accumulate in the database without being properly purged. A high volume of these orphaned objects (e.g., 40,000+) leads to database instability, causing the Analytics service to crash repeatedly (SIGSEGV

Resolution

To resolve this issue, you must perform database maintenance to purge the orphaned objects.

 

  • Contact Broadcom Support and reference this KB so the specific database cleanup steps can safely be completed

  • Once the cleanup is complete, the Analytics service will stabilize and the alert should clear