Aria Operations nodes are failing and recovering one after another
search cancel

Aria Operations nodes are failing and recovering one after another

book

Article ID: 422622

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Your Aria Operations cluster experiences instability and exhibits the following symptoms:

  • In the admin interface (https://<FQDN_of_Operations>/admin), the node status loads intermittently and shows a loading icon
  • You are intermittently not able to log into the Aria Operations UI and the admin interface
  • Cloud proxies intermittently show a status of Offline

Additionally, you have verified the cluster is configured properly and all networking, DNS, NTP and sizing requirements are met per VMware Aria Operations 8.18 Requirements.

Environment

Aria Operations 8.18.x

Resolution

To resolve this issue, follow these steps:

  1. Log into the primary node via SSH with the root account and restart the vmware-vcops service:
    systemctl restart vmware-vcops
  2. Log into the primary node admin interface at https://<FQDN_or_IP_of_primary_node>/admin
  3. Click on the TAKE CLUSTER OFFLINE button
  4. After the cluster successfully goes offline, click on the BRING CLUSTER ONLINE button

The cluster show now be stable with all nodes and cloud proxies showing as Online.

Additional Information

If the above procedure doesn't resolve the issue, a manual cluster restart may be necessary. See: Rebooting nodes in Aria Operations


Japanese version: Aria Operations ノードが次々と障害を起こし、復旧を繰り返している