1. The Continuous Availability (CA) cluster is stuck at "Going online", and the nodes are stuck at "waiting for analytics" after a site-wide network outage.
2. All nodes' VMs are up and working normally in vCenter.
3. On all the nodes' SSH sessions, running the command df -h shows no disk space issue.
4. Running the command vrops-status shows Slice Online-true, but analytics is not running.
Aria Operations 8.18.x
The nodes might not have been powered off and powered back on in the right order.
1. SSH into the Primary node as root, and manually take the cluster offline by running the following command:
$VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOffline --offlineReason "Manual"
2. If the above manual method does not work, run the following command to stop the casa service on a random node, which will force the cluster to fail, so the Cluster Status page will show "X Failure" instead of "Going online". And then the FORCE TAKE CLUSTER OFFLINE button will be available.
service vmware-casa stop
3. Click on FORCE TAKE CLUSTER OFFLINE to take the cluster offline.
4. Once the cluster is successfully taken offline, proceed to the vCenter and power-cycle the nodes for the CA cluster by following the Shutdown and Startup sequence for Aria Operations cluster.
5. Once the nodes are powered on, log back in to Aria Operations admin ui (https://AriaOps_IP_or_FQDN/admin), click on the BRING CLUSTER ONLINE button, and the cluster should come back online.