Utilizing HA switching to replace the Primary node in Aria Operations
search cancel

Utilizing HA switching to replace the Primary node in Aria Operations

book

Article ID: 341309

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Procedure to replace the primary node without redeploying Aria Operations.

Warning! The resolution in this KB will result in historical data loss if the FSDB syncs fail to complete between steps.

Depending on the amount of data, this process can take hours, or days to complete, mostly waiting for FSDB to sync between steps.

Environment

VMware Aria Operations 8.x

Resolution

Mandatory: Take snapshots - Snapshot Creation in VMware Aria Operations
Snapshots are required on all analytics (Primary, Replica, and Data) nodes in the cluster before following the steps below. It is not necessary to take snapshots of Cloud Proxies.

 
Since there is no way to deploy a new Primary node, HA can be utilized to replace the Primary node.
For Simplicity, the old Primary node will be referred to as M1, and the new Primary node will be M2.
 

HA currently enabled

  1. Deploy a new data node and add it to the vRealize Operations cluster using the configuration desired for the new Primary node. This node will be referred to as M2.
  2. Log in to the Aria Operations admin UI.
  3. Click Disable to disable HA and wait for the cluster to go back online.
  4. Wait for the status of FSDB on all Analytic nodes to say Running.
Note: Use the following command to check the status of the FSDB sync:
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p'
  1. In the admin UI, you should now have a Primary node (M1) with no Replica node.
  2. Click Enable to enable HA and select M2 to be the replica node and wait for the cluster to go back online.
  3. Wait for the status of FSDB on all Analytic nodes to say Running.
Note: Use the following command to check the status of the FSDB sync:
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p'
  1. Select M1 and click Take Node Offline/Online to bring this node offline. Wait for HA to promote the M2 node as the new Primary node.
  2. Select M1 and click Take Node Offline/Online to bring this node online. Wait for HA to promote the M1 node as the new Replica node.
  3. Click Disable to disable HA and wait for the cluster to go back online.
  4. Wait for the status of FSDB on all Analytic nodes to say Running.
Note: Use the following command to check the status of the FSDB sync:
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p'
  1. M2 should now be the Primary , while M1 is a data node.
  2. M1 can now be removed from the cluster and you can re-enable HA with your desired Replica node.

HA currently disabled

Note that for non-HA cluster where Primary node is completely down and unrecoverable, it is not possible to follow the steps below. A cluster with a non-functioning primary node is considered unrecoverable, and cluster must be redeployed.
The steps below can be used in cases where the primary node is functional, but is experiencing problems that require redeployment of primary node.

  1. Deploy a new data node and add it to the vRealize Operations cluster using the configuration desired for the new Primary node. This node will be referred to as M2.
  2. Log in to the Aria Operations admin UI.
  3. Click Enable to enable HA and select M2 to be the Replica node and wait for the cluster to go back online.
  4. Wait for the status of FSDB on all Analytic nodes to say Running.
Note: Use the following command to check the status of the FSDB sync:
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p'
  1. Select M1 and click Take Node Offline/Online to bring this node offline. Wait for HA to promote the M2 node as the new Primary node.
  2. Select M1 and click Take Node Offline/Online to bring this node online. Wait for HA to promote the M1 node as the new Replica node.
  3. Click Disable to disable HA and wait for the cluster to go back online.
  4. Wait for the status of FSDB on all Analytic nodes to say Running.
Note: Use the following command to check the status of the FSDB sync:
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo | sed -nre '/stateMappings/,/}$/p'
  1. M2 should now be the Primary, while M1 is a data node.
  2. M1 can now be removed from the cluster.