HCX - Mobility Optimized Networking (MON) scalability guide

Products

VMware HCX VMware Cloud on AWS

Issue/Introduction

This document describes the functional capacity when operating Mobility Optimized Networking (MON) feature in HCX.
The supported scale numbers are referenced per HCX Cloud Manager, irrespectively of the number of Site Pairings or Service Meshes deployed.

Considerations operating MON

There are several factors that limit the number of VMs that can have MON enabled at the same time while operating HCX for migration and network extension:

VM resources on the HCX Cloud Manger
Number of extended networks with MON enabled
Number of VMs with MON enabled
Number of VMs for which MON is being enabled/disabled at the same time
Number of VMs with network events that will require MON configuration changes at the same time (i.e. during a host failure event, vSphere Distributed Resource Scheduler will relocate a certain number of VMs)

Default HCX Cloud Manager Resource Allocation: 4 vCPUs & 12 GB RAM

The supported numbers for MON in a HCX Cloud Manager are:

HCX version 4.5 or higher:

400 VMs with MON enabled at any given time
100 Network Extensions with MON enabled
200 concurrent MON operations:
Enable or disable MON on a VM
Bulk migrations to MON enabled segments
Combination of both

HCX version 4.4 and earlier:

250 VMs with MON enabled at any given time
100 Network Extensions with MON enabled
100 concurrent MON operations:
Enable or disable MON on a VM
Bulk migrations to MON enabled segments
Combination of both

Resolution

Scale up MON

To improve MON scalability, resources on the HCX Cloud Manager must be increased to:
Extended HCX Cloud Manager Resource Allocation: 8 vCPUs & 24 GB RAM

The maximum supported numbers for MON in a HCX Cloud Manager are:

HCX version 4.5 or higher:

1000 VMs with MON enabled at any given time
100 Network Extensions with MON enabled
200 concurrent MON operations:
Enable or disable MON on a VM
Bulk migrations to MON enabled segments
Combination of both

HCX version 4.4 and earlier:

900 VMs with MON enabled at any given time
100 Network Extensions with MON enabled
100 concurrent MON operations:
Enable or disable MON on a VM
Bulk migrations to MON enabled segments
Combination of both

Increase resources on the HCX Cloud Manager

The following procedure must be used to increase resource allocation on the HCX Cloud Manager VM.

Requirements and Considerations before increasing resources on the HCX Cloud Manager

Do NOT exceed recommended allocations as that may cause the HCX Cloud Manager to malfunction.
Both HCX Cloud Manager and Connector must be running version HCX 4.3.2 or later.
No specific vCenter or NSX version required but all compatibility restrictions still apply. Refer to HCX User Guide.
There should be NO active migration or configuration workflows when making these resource changes.
Changes must be made during a scheduled Maintenance Window.
There is NO impact to Network Extension services.
If MON is already enabled, events will NOT be detected while the HCX Cloud Manager remains out of service.
There is NO need to increase the disk size.
Increasing resources allocation on the HCX NE appliances will not have any effect on the scalability.
In VMC environment, there is no prior reservation for vCPU resources for HCX Cloud Manager in MgmtResourcePool associated to a given SDDC. In that case, user is required to ensure that the extended vCPU resource should be available always in the MgmtResourcePool to sustain the MON scale up requirement for HCX Cloud Manager before and after the scale up.

Procedure

Validation to check if HCX Cloud manager version is running 4.3.2 or higher.

Pre-Checks:

Login to HCX Cloud manager admin CLI >> ccli >> list

admin@hcx [ ~ ]$ ccli
Welcome to HCX Central CLI
[admin@hcx] list
|-------------------------------------------------------------------|
| Node          | Id | Address          | State     | Selected      |
|-------------------------------------------------------------------|
| Example-IX-R1 | 0  |#.#.#.#:9443      | Connected |               |
|-------------------------------------------------------------------|
| Example-NE-R1 | 1  |#.#.#.#:9443      | Connected |               |
|-------------------------------------------------------------------|

Check the below services running in the HCX cloud manager:

admin@hcx [ ~ ]$ systemctl --type=service | grep "zoo\|kaf\|web\|app\|postgres"
  app-engine.service                   loaded active     running       App-Engine                                                        
  appliance-management.service         loaded active     running       Appliance Management                                              
  kafka.service                        loaded active     running       Kafka                                                             
  postgresdb.service                   loaded active     running       PostgresDB                                                                                              
  web-engine.service                   loaded active     running       WebEngine                                                         
  zookeeper.service                    loaded active     running       Zookeeper

Login to vCenter that hosts the HCX Cloud Manager

Note: If HCX Cloud Manager is already running with extended compute resources then no need for the re-execution of the compute resources, to avoid un-necessary reboot of the HCX Cloud Manager.

Shutdown the VM's GuestOS using vCenter UI.
Edit HCX Cloud Manager's VM to increase the vCPU and MEM reservations. Refer to:
- Virtual CPU Configuration
- Virtual Memory Configuration
Power ON the HCX Cloud Manager VM.
Access HCX Appliance Management Interface AUI (9443) to ensure all services are running

Post-Checks:

Login to HCX Cloud manager admin CLI >> ccli >> list

admin@hcx [ ~ ]$ ccli
Welcome to HCX Central CLI
[admin@hcx] list
|-------------------------------------------------------------------|
| Node          | Id | Address       | State     | Selected         |
|-------------------------------------------------------------------|
| Example-IX-R1 | 0  |#.#.#.#:9443   | Connected |                  |
|-------------------------------------------------------------------|
| Example-NE-R1 | 1  |#.#.#.#:9443   | Connected |                  |
|-------------------------------------------------------------------|

Check the below services running in the HCX cloud manager:

admin@hcx [ ~ ]$ systemctl --type=service | grep "zoo\|kaf\|web\|app\|postgres"
  app-engine.service                   loaded active     running       App-Engine                                                        
  appliance-management.service         loaded active     running       Appliance Management                                              
  kafka.service                        loaded active     running       Kafka                                                             
  postgresdb.service                   loaded active     running       PostgresDB                                                                                              
  web-engine.service                   loaded active     running       WebEngine                                                         
  zookeeper.service                    loaded active     running       Zookeeper

Important: In the event the HCX Cloud Manager fails to reboot OR any above listed services fail to start OR fleet appliances IX/NE/SDR got disconnected, revert the configuration changes immediately and ensure the system comes back on-line.

Recommendations operating MON at scale

As a best practice, use vSphere Monitoring and Performance to monitor HCX Cloud Manager CPU utilization and MEM usage.
Do NOT exceed the recommended limits as that could cause system instability and failed workflows.
In a scaled up environment, when MON operations are being processed, expect for the CPU utilization to increase significantly during a short periods of time and there may be a temporary delay in the UI response.
Limit the concurrency of MON operations when making configuration changes while having active Bulk migrations into MON enabled segments.
The regular limit per HCX Manager applies for any other Network Extensions that do NOT have MON enabled.

Additional Information

Refer to HCX Configuration Maximums
Contact your Cloud Provider regarding the availability of this procedure to scale up MON for HCX in your SDDC.
This option is currently NOT available in VMC on Dell EMC.
For Scale up requirements on VMC on AWS Cloud, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.