HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.10 or higher
search cancel

HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.10 or higher

book

Article ID: 373010

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

This document describes the functional capacity for migrations using vSphere Replication (vSR) Bulk and Replication Assisted vMotion (RAV) in HCX.

The supported scale numbers are referenced per HCX Manager, irrespective of the number of Site Pairings or Service Mesh/IX Appliances deployed.

A configuration guide is provided within this document to increase the scale of concurrent Bulk/RAV migrations per HCX Manager beyond the default value for HCX Manager systems running HCX software version 4.10 or higher. For HCX Manager systems running HCX version 4.7 to 4.9 the configuration guide procedure is documented in Knowledge Article 321604 "HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.7 to 4.9".

 

Considerations for Concurrent Migration

There are several factors, at both source & target HCX Manager, that can limit the number of concurrent migrations performed using Bulk & RAV (initial/delta sync):

  • Data storage
    • IOPS capacity.
    • Shared vs. dedicated.
  • Host resources
    • Overall ESXi host resources for all services.
    • CPU & MEM reservations for the IX appliance VM.
    • pNIC/VMNIC capacity and shared load.
    • Dedicated vmk interfaces for different services like mgmt/vMotion/vSR.
  • Network Infrastructure throughout the entire data path
    • Data Center local network.
    • Service Provider network infrastructure between source/target sites.
    • Bandwidth availability.
    • Latency and path reliability (packet loss).
      • vSphere replication (vSR) performance drops exponentially with higher packet loss and/or higher latency.
      • There is a built-in tolerance for high latency in vSphere replication but throughput will be reduced significantly.

Note: HCX Transport Analytics functionality can be used to measure network infrastructure throughput during migration planning phase. Refer Broadcom HCX user guide.

  • Workload VM conditions
    • Number of disks.
    • Total and size per disk.
    • Active services/applications.
    • Data churning/disk changes.

 

Default (Baseline) HCX Manager Resource Allocation:

vCPU

RAM (GB)

Disk Size (GB)

Concurrent Bulk/Rav Migrations per HCX Manager

4

12

64

300

 

The supported numbers for concurrent Bulk/RAV migrations per Baseline HCX Manager deployments are:

  • 300 concurrent migrations per Manager.
  • 200 concurrent migrations per Service Mesh/IX Appliance.
  • 1Gbps max per migration workflow.
  • 1.6Gbps max per IX appliance (any number of concurrent migration workflows).

Resolution

The following configuration guide is provided to increase the scale of concurrent Bulk/RAV migrations per HCX Manager beyond the default value for HCX Manager systems running HCX version 4.10 or higher. For HCX Manager systems running HCX version 4.7 to 4.9 the configuration guide procedure is documented in Knowledge Article 321604 "HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.7 to 4.9".

HCX Manager software version 4.10 or newer provides configuration settings for increased scale of concurrent Bulk/RAV migrations based on default, medium, and large size settings on each HCX Manager. These form factors include pre-defined settings for disk space, memory allocation for the app-engine software process and the number of threads for different sizes of migrations.

The supported scale numbers are referenced per HCX Manager, irrespective of the number of Site Pairings or Service Mesh/IX Appliances deployed.

Scale Form factor

vCPU count

Memory in GB

Storage in GB

Concurrent Bulk/Rav Migrations per HCX Manager

Default

4

12

64

300

Medium

8

24

120

600

Large

16

48

300

1000

 

Scenarios of Upscale Configuration

Case 1: HCX Manager upgraded from 4.7-4.9 to 4.10.0.0 where the HCX version 4.7 to 4.9 upscale procedure documented inKnowledge Article 321604 "HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.7 to 4.9" was not applied previously.

  • Scale form factor is not set.
  • User must increase VM compute and storage to medium or large scale form factor as in section A) below.
  • User can execute upscale_configs.sh script to medium or large scale form factor as in section B) below.

Case 2: HCX Manager upgraded from 4.7-4.9 to 4.10.0.0 where the HCX version 4.7 to 4.9 upscale procedure documented in Knowledge Article 321604 "HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.7 to 4.9" was applied previously

Case 3: HCX Manager is newly deployed with default scale form factor

  • User must increase VM compute and storage to medium or large scale form factor as in section A) below.
  • User can execute upscale_configs.sh script to medium or large scale form factor as in section B) below.

Case 4: HCX Manager is upgraded from 4.10.0.0 to 4.10+ with scale form factor not applied

  • User must increase VM compute and storage to medium or large scale form factor in section A) below.
  • User can execute upscale_configs.sh script to medium or large scale form factor as in section B) below.

Case 5: HCX Manager is upgraded from 4.10.0.0 to 4.10+ with scale form factor applied

  • HCX Manager will retain already applied settings after upgrade unless better configurations are decided for the predefined scale form factors.
  • User must increase VM compute and storage to medium or large scale form factor as in section A) below.
  • User can execute upscale_configs.sh script to medium or large scale form factor as in section B) below.

 

Steps to perform upscale configuration of HCX Managers

The following steps should be performed based on the matching use case from the previous section of this guide.

The steps should be performed on both the HCX Connector Manager VM & HCX Cloud Manager VM running HCX software version 4.10 or higher.

 

Section A) Ensure each HCX Manager has the appropriate CPU/Memory/Disk space for the required scale form factor

Scale Form factor

vCPU count

Memory in GB

Storage in GB

Concurrent Bulk/Rav Migrations per HCX Manager

Default

4

12

64

300

Medium

8

24

120

600

Large

16

48

300

1000

 

Procedure to increase resources on the HCX Connector/Cloud Manager

The following procedure must be used to increase resource allocation on both the HCX Connector Manager VM & HCX Cloud Manager VM.

Requirements and Considerations before increasing resources on the HCX Connector & Cloud Manager

  • Do NOT exceed recommended allocations as that may cause the HCX Connector/Cloud Manager to malfunction.
  • Both HCX Cloud Manager and Connector must be running version HCX 4.10.0 or newer.
  • There should be NO active migration or configuration workflows when making these resource changes.
  • Changes must be made during a scheduled Maintenance Window.
  • There is NO impact to Network Extension services.
  • There is NO change of concurrency for HCX vMotion/Cold Migration workflow.
  • The concurrent migration limit specified for HCX Replicated Assisted vMotion (RAV) is ONLY for Initial & Delta sync. During RAV switchover stage, only one relocation will be serviced at a time on a serial basis.
  • Additional service meshes/IX appliance should be deployed for unique workload clusters to aggregate the replication capacity of multiple IX appliances. A different Services Mesh can be deployed for each workload cluster at source and/or target.
  • If there are multiple service meshes/IX Appliances then RAV can switchover in parallel, however per SM/IX Pair it will always be sequential.

Procedure

IMPORTANT: If your HCX Managers are falling into Case 2 of Scenarios of Upscale Configuration then this Section A) can be skipped.

IMPORTANT: It is recommended to take snapshots for HCX Connector & Cloud Manager VMs prior to executing steps.

Step 1: Increase the vCPU and memory of HCX Manager to match the desired scale factor in the above table:

 

Step 2: Add a 120GB or 300GB Storage disk to HCX Connector & Cloud Manager based on the desired scale factor in the above table and then increase the common disk partition size. For HCX Managers with previously modified 300GB disk settings from Knowledge Article 321604 "HCX - Bulk Migration & Replication Assisted vMotion (RAV) scalability guide - for HCX version 4.7 to 4.9" steps are still required to increase the common disk partition size.

Refer to Knowledge Article 373238 "HCX - Increasing HCX Manager Disk Space for HCX Software Version 4.10 or Higher" for the procedure.

 

Section B) Execute upscale_configs.sh script to medium or large scale form factor

Step 1: Login to HCX Manager SSH Console using 'admin' user.

 

Step 2: Switch to 'root' user.

 

Step 3: Change directory to '/usr/local/hcx/sbin'.

 

Step 4: Execute upscale_configs.sh using below command (the app-engine software process will be automatically restarted).

sh upscale_configs.sh medium

OR

sh upscale_configs.sh large

 

Step 5: Wait until the app-engine software process restarts completely before attempting to access the HCX UI to perform operations.

systemctl status app-engine

Step 6: To confirm the scale setting of medium or large use the below command.

cat /common/scale-form-factor

 

Required Steps During Future HCX Manager Upgrade

  • The steps performed in this scalability procedure are not persisted after an HCX Manager upgrade. It is required to perform the following scalability configuration steps after a software upgrade.
  • After upgrade, the user mapping for the postgres service may change. This may causes the postgres service to not start after upgrade. The below steps need to be performed after the successful upgrade of the HCX Manager. 

NOTE These steps are only required for HCX Manager systems that were previously scaled-up prior to the upgrade

  • Before upgrade with scale settings implemented:

root@hcx [ /common_ext ]# ls -l | grep postgres

drwx------ 19 postgres postgres  4096 Oct 31 11:33 postgres-db

 

  • After upgrade on the same system:

root@hcx [ /common_ext ]# ls -l

drwx------ 19     1001 appmgmt   4096 Oct 31 11:33 postgres-db

NOTE: User (1001) and group (appmgmt) mappings are arbitrary

 

  • Change the ownership and group for the postgres-db

root@hcx [ /common_ext ]# chown -R postgres:postgres postgres-db

root@hcx [ /common_ext ]# ls -l

drwx------ 19 postgres postgres  4096 Oct 31 12:41 postgres-db

 

Recommendations when Operating Concurrent Migrations at Scale

  • As a best practice, use vSphere Monitoring and Performance to monitor HCX Connector & Cloud Manager CPU utilization and MEM usage.
  • Do NOT exceed the recommended limits as that could cause system instability and failed migration workflows.
  • In a scaled up environment, when migration operations are being processed, expect for the CPU utilization to increase significantly during a short periods of time and there may be a temporary delay in the UI response for migration progressing events.
  • Limit the concurrency of MON operations on target cloud when making configuration changes while having active concurrent Bulk migrations into MON enabled segments during switchover.
  • Follow the migration events and estimation on the HCX UI to determine any slowness that may be caused by the infrastructure or the network.
  • Additionally, vSphere Replication status can be monitored from the source ESXi host. Refer Broadcom Knowledge Article 323663 "HCX - Bulk Migration Operations and Best Practices"
  • If a source ESXi host is heavily occupied from memory, I/O rate perspective, then replication performance will be affected. As a result, Bulk/RAV workflow may takes more time to complete initial base sync provided there are no slowness in the underlying datapath.

Note: In such cases, the recommendation is to relocate the source VM compute resources to another ESXi host probably a free one using native vCenter vMotion. This action won't impact ongoing replication process and do not require any changes in the migration workflow.

  • The Bulk/RAV migration workflow consists of multiple stages (i.e. initial/delta sync, off-line sync, disk consolidation, data checksum, VM instantiation, etc.) and most are not dependent of network infrastructure hence the time to complete a migration for any given VM, from start to finish, may vary depending on the conditions and it is not a simple calculation based on the size of the VM and the assumed network bandwidth.

Additional Information

Refer to HCX Configuration Limits.

Refer to Network Underlay Characterization for more information about HCX dependencies on the network infrastructure between sites.

Refer to HCX Bulk Migration Operations & Best Practices.

Contact your Cloud Provider regarding the availability of this procedure to scale up your cloud Data Center.

For Scale up requirements on VMConAWS Cloud, please open a service request with the Broadcom Support team.