Issues caused by analytic nodes in multiple geographic locations
search cancel

Issues caused by analytic nodes in multiple geographic locations

book

Article ID: 406325

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • The delete snapshot job in Automation Central is failing daily
  • Reports are coming in empty like oversized vm report
  • Several VMs in inventory not collecting data and show gray question marks on the summary page
  • One or more analytic nodes (primary, replica, data) show a status of Waiting for Analytics or Offline when viewed in the Administration > Control Panel > Cluster Management page in the Product UI or on the System Status tab of the Admin UI

Environment

VMware Aria Operations 8.x

Cause

An analytic node (primary replica, data) was deployed to a different geographic location than the primary node. The round-trip network latency between the primary node and other analytic node(s) is greater than the 5 ms allowed causing services to become unstable or crash.

Resolution

Use the Shrink Cluster feature to remove the analytic node(s) from a different geographic region

Note: Using the Shrink Cluster feature will not result in any data loss.

  1. Create cluster offline snapshots of all Aria Operations nodes (including cloud proxies)

    Snapshot Creation in VMware Aria Operations
  2. Log in to the Admin UI at https://[primary_node_ip_or_fqdn]/admin
  3. If the Cluster Status is not Online, click the BRING CLUSTER ONLINE button and wait for the Cluster Status to show Online 
  4. Click the Shrink Cluster button
  5. Select the Data Node(s) to remove from the cluster and click Next

    Note:
    Verify there is enough Free Space on the node(s) remaining in the cluster to accommodate the Used Space from the node(s) being removed.
  6. If one of the nodes selected in Step 5 was a Primary Replica node, select a Data node to become the new Primary Replica node and click Next
  7. Review the list of adapters that will be relocated as part of the process then check the I understand the risk checkbox and click Next
  8. Type a Reason in the Finalize dialog, then click Shrink Cluster
  9. Wait for the Data migration to complete and the cluster to automatically restart to remove the node(s)

    Note: 
    The data migration step can take many hours depending on the amount of data residing on the nodes to be removed and the network bandwidth available between the nodes to remove and the remaining nodes.

Additional Information