Aria Operations Cluster Failed to Come Online Due to Incorrect is_admin Node Configuration on casa.db.script
search cancel

Aria Operations Cluster Failed to Come Online Due to Incorrect is_admin Node Configuration on casa.db.script

book

Article ID: 439343

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  1. The VMware Aria Operations cluster fails to start, with the UI showing the status as "Cluster failed to come online."

  2. In the Aria Operations admin portal, the primary, replica, and data nodes appear correctly listed and show as running/online. However, the underlying cluster management database has mismatched role assignments, which are preventing the cluster from starting.

Environment

8.18.5

Cause

The casa.db.script configuration file acts as the source of truth for node roles and contains incorrect entries. Specifically, the primary node is erroneously configured with "is_admin_node": false. As a result, the cluster manager fails to recognize an admin node during startup, preventing the cluster from coming online.

Location of the casa.db.script file.

# /storage/db/casa/webapp/hsqldb/


Note: This is a simple script for easier readability:

# sed -nre "/clusterMembership/ s/^[^']+'([^']+)','([^']+)'.*/\2/p" /storage/db/casa/webapp/hsqldb/casa.db.script | python -m json.tool


It extracts the second field from lines containing clusterMembership in the casa.db.script file and formats the output as pretty-printed JSON.

Resolution

To fix the issue the files need to be updated correct entries:

Note: Each of the below steps must be performed on every node.

  1. Log into the Aria Operations admin portal as the admin user.

  2. Verify the roles of the primary and replica nodes in the UI.

  3. Click "Take Cluster Offline" to safely bring the cluster down.

  4. Take a virtual machine snapshot of all Aria Operations nodes.

    Snapshot Creation in VMware Aria Operations

    Shutdown and Startup sequence for VCF/Aria Operations cluster

  5. Access the all node as root using SSH.

  6. Run the following command to stop the casa service.

    # service vmware-casa stop


  7. Navigate to the Casa database directory:

    # cd /storage/db/casa/webapp/hsqldb


  8. Create a backup of the configuration file:

    # cp -p casa.db.script casa.db.script.bkp


  9. Edit the casa.db.script file using a text editor (using vi).

  10. Locate the entry for the primary node, which will appear similar to:

    "is_admin_node": false,"ip_address":"<REDACTED_IPS>"


  11. Modify the entry to set the admin node flag to true:

    "is_admin_node":true


  12. Save the file and exit the text editor (:wq!)

  13. Run the following command to start the casa service.

    # service vmware-casa start


  14. Return to the Aria Operations admin portal and bring the cluster online.