VCF Operations for Networks High Availability and Disaster Recovery Best Practices for clustered deployments
search cancel

VCF Operations for Networks High Availability and Disaster Recovery Best Practices for clustered deployments

book

Article ID: 438049

calendar_today

Updated On:

Products

VCF Operations for Networks

Issue/Introduction

Guidance is required for implementing High Availability and disaster recovery best practices for a clustered deployment to increase resiliency against potential outages and data loss.

Environment

VCF Operations for Networks

Cause

To maintain optimal resiliency, the product architecture requires periodic quiesced backups to ensure logically consistent checkpoint restore points, alongside proactive monitoring of GUI alerts to prevent transient issues from causing cascading system failures.

Resolution

 

  1. Establish a periodic backup cadence (e.g., weekly, monthly, and/or any other significant frequency for your organization).

  2. Shut down the cluster to a logically consistent state by following all steps UP TO BUT NOT INCLUDING the step that begins with "Take snapshots ..." in the Resolution section of KB 314428 - Best practices to shutdown VCF Operations for Networks Clustered deployments.

  3. Execute a full backup of all Platform and Collector Nodes using your standard backup regime.

  4. Restore cluster operations by following all steps AFTER the step that begins with "Take snapshots ..." in KB 314428 - Best practices to shutdown VCF Operations for Networks Clustered deployments.

  5. Routinely monitor the VCF Operations for Networks GUI by navigating to Settings > Infrastructure and Support > Infrastructure and Updates tab.

  6. Investigate generated alerts. If persistent problems are flagged on the Platform or Collector nodes, capture screenshots of the details and open a Support Case proactively using the instructions at KB 142884 - Creating and managing Broadcom cases to address issues before they escalate.

 

Additional Information

For a simple (non-clustered) deployment, there is only one Platform node. 

Therefore, the following is revised from the Resolution above:

  1. Establish a periodic backup cadence (e.g., weekly, monthly, and/or any other significant frequency for your organization).

  2. Shut down Collector node(s) using vCenter --> Power --> Shut Down Guest O/S action.  If more than one Collector node, the sequence does not matter.

  3. Shut down the Platform node using vCenter --> Power --> Shut Down Guest O/S action
  4. Execute a full backup of the Platform and Collector Node(s) using your standard backup regime.

  5. Power on the Platform node 

  6. Power on the Collector node(s).   If more than one Collector node, the sequence does not matter.
  7. Routinely monitor the VCF Operations for Networks GUI by navigating to Settings > Infrastructure and Support > Infrastructure and Updates tab.

  8. Investigate generated alerts. If persistent problems are flagged on the Platform or Collector nodes, capture screenshots of the details and open a Support Case proactively using the instructions at KB 142884 - Creating and managing Broadcom cases to address issues before they escalate.