High Availability stuck in activating state with new deployed replica node
search cancel

High Availability stuck in activating state with new deployed replica node

book

Article ID: 402019

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Enabling HA with a newly deployed replica node is stuck in "Activating" state , cluster is in going online state and node status is "Waiting for Analytics"
  • When signing into root ssh session for the newly deployed replica node see root@localhost
  • Running the command hostname on root ssh session returns localhost 
  • Viewing the /etc/hosts file on the node shows localhost 
  • /storage/log/vcops/log/casa.log reports an error with name resolution 
  • Running nslookup of the fqdn of the node fails as does the reverse nslookup 

Environment

Aria Operations 8.18.x

Cause

Nodes FQDN was set incorrectly and also DNS was set incorrectly 

Resolution

If you have taken snapshots as per kb Snapshot Creation in VMware Aria Operations before you tried enabling HA the recommendation is to:

  1. Revert to snapshots 
  2. Set the Value of the Domain Name property and also the Domain Name Servers property to the correct values using the document Configure vApp Properties as a guide .  Please power on the nodes as per kb Shutdown and Startup sequence for Aria Operations cluster 
  3. Bring the cluster back online from the Admin UI
  4. Enable HA 

If you did not take snapshots before enabling HA:

  1. Log in with ssh session using root account for all nodes in the cluster (Primary/Replica/Data)
  2. Run the following command on all cluster nodes (Primary/Replica/Data) to stop cluster services on the node and bring to offline state 
    $VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOffline --offline "Disable HA"
  3. Take snapshot of all nodes in the cluster as per kb Snapshot Creation in VMware Aria Operations 
  4. Disable HA using the kb How to enable/disable HA when the option is not available in Aria Operations UI 
  5. If the cluster status is still showing that HA is activating please apply kb VMware Aria Operations High Availability (HA) stuck in "Failed to Deactivate" state or stuck in "Activating" state
  6. Set the Value of the Domain Name property and also the Domain Name Servers property to correct values using the document Configure vApp Properties as a guide.  Please power on the nodes as per kb Shutdown and Startup sequence for Aria Operations cluster 
  7. Update casa.db.script to "Offline" status using the following command for each node:
    service vmware-casa stop;sleep 10;cp --backup=t /storage/db/casa/webapp/hsqldb/casa.db.script /storage/db/casa/webapp/hsqldb/casa.db.script.backup;sed -ri 's/"onlineState":"\w+"/"onlineState":"OFFLINE"/g;s/"initialization_state":"\w+"/"initialization_state":"NONE"/g;s/"online_state":"\w+"/"online_state":"OFFLINE"/g;s/"online_state_reason":"\w+"/"online_state_reason":""/g;s/"remove_node_state":"\w+"/"remove_node_state":"NONE"/g;s/"installation_state":"\w+"/"installation_state":"DONE"/g' /storage/db/casa/webapp/hsqldb/casa.db.script;sleep 2;service vmware-casa start;echo -e "\e[1;35mCluster Status has been modified\e[0m";grep "onlineState" /storage/db/casa/webapp/hsqldb/casa.db.script;
  8. Bring the cluster back online through the UI
  9. Enable HA

 

Additional Information

If you continue to experience an issue after completing the steps above please generate a support bundle as per doc Create a VMware Aria Operations Support Bundle .  Revert to snapshots and raise a technical SR as per kb Creating and managing Broadcom support cases then upload the logs you generated to the portal as per kb Uploading files to cases on the Broadcom Support Portal