This article provides information about deploying vSphere Metro Storage Cluster with VMware Cloud Foundation 9.x.
VMware Cloud Foundation 9.x
What is vMSC?
VMware vSphere® Metro Storage Cluster (vMSC) is a specific storage configuration that is commonly referred to as a stretched storage cluster or metro storage cluster. It is a single vSphere cluster that spans multiple physical sites.
A vMSC leverages two independent storage systems each located in a separate physical site linked by a high-speed, low-latency network. The solution can provide a zero recovery point objective (RPO) and near zero recovery time objective (RTO) due to the use of synchronous replication and active-active configuration.
Uniform vs Non-uniform Host Access
vMSC supports both uniform and non-uniform host access configurations. In a Uniform host access configuration each ESX host can access storage arrays in both sites as pictured below.
In a non-uniform configuration ESX hosts are configured to access the storage array in the same local site as pictured below.
Single-site Storage
It is also possible to configure a vMSC with a single storage array. In this Use Case, ESX hosts are divided between Site A and Site B; and the storage array is placed in either Site A, Site B or a third site. The use of a single storage array doesn’t provide the resiliency that most customers require as it does not protect against storage array failure or a network partition when the array is placed in a third site.
Alternative to vMSC
vSAN Stretched Clusters are an attractive alternative to vMSC and offer the following advantages:
vMSC Support
In the past, vMSC storage configurations were validated through a mandatory certification program. As of vSphere 6.0 this program was discontinued. Before purchasing, designing or implementing a vSphere Metro Storage Cluster solution please contact your storage vendor to ensure that your solution has been tested with VMware Cloud Foundation 9.x.
Note: Broadcom|VMware does not test any vSphere Metrocluster Storage solutions internally.
IMPORTANT: Many storage vendors have tested compatibility with ESX 9.x but have not completed testing by deploying the full VCF 9.x suite and validating all of the VCF
Day 0 and Day 2 workflows.
Technical Requirements and Constraints
| Storage Type |
VCF Installer Deployment Model
|
Management Domain (First Cluster)
|
Management Domain (Additional Clusters) or Workload Domain (All Clusters) |
| NFS,FC | Simple |
Minimum: Recommended: |
Minimum: Recommended: |
| Highly Available |
Minimum: Recommended: |
Not applicable
|
Note: The disadvantage of using the minimum number of hosts vs the recommended number of hosts is that lifecycle management tasks may require VMs to be moved from Site A to Site B. As an example there are three NSX Managers in a Highly Available deployment, if only three (3) ESX hosts are used in Site A one of the NSX Managers will have to be migrated to Site B during an upgrade. By using the recommended number of four (4) ESX hosts the NSX Manager could move to another host within the same site during the upgrade which may provide better performance.
Common Network Configurations
There are several supported Network Configurations. The most common configurations include:
● Stretch Management VM Layer-2 Network Only
● Stretch All Layer-2 Networks
Stretch Management VM Layer-2 Network Only
In this example the only network that is stretched is the Management VM Network which hosts vCenter, NSX Manager, VCF Operations, etc. This example increases the total number of subnets and VLANs required for the Stretched Cluster configuration however it creates smaller failure domains for these networks. As an example, a denial-of-service attack or a network broadcast storm could be isolated to the local VLAN. To ensure the fastest recovery time in the event of a failure, ensure that your VMs can be powered-on in Availability Zone 2 without requiring IP Addresses to be reconfigured. This configuration is not supported for the initial cluster in the Management Domain.
Note: If NFS is used as principal storage, the L2-network for NFS must be stretched between Availability Zones.
Stretch All Layer-2 Networks
In this example all Layer-2 networks are stretched between Availability Zones. This example reduces the total number of subnets required for the Stretched Cluster
configuration. To ensure the fastest recovery time in the event of a failure, ensure that your VMs can be powered-on in Availability Zone 2 without requiring IP Addresses to be
reconfigured.
Note: The ESX Mgmt Network (vmk0) VLAN:1610 in the Stretch All Networks example above does not need to be stretched. Some customers may wish to use a unique
network for ESX Mgmt as pictured in the Stretch Management VM Layer-2 Network example.
Summary of Deployment Steps
VMware Cloud Foundation is unaware of vMSC and treats it the same as a vSphere cluster. As a result the following manual steps will need to be performed.
Recommended vSphere HA Settings
The following settings are general recommendations. Consult your storage vendor or review your storage solution documentation to determine if these settings are applicable.
Failures and responses
| Enable Host Monitoring | Enabled |
|
Host Failure Response - Failure Response |
Restart VMs |
|
Response for Host Isolation |
Power off and restart VMs
|
|
Datastore with PDL |
Power off and restart VMs
|
|
Datastore with APD |
Power off and restart VMs (Conservative) |
| VM Monitoring - Enable heartbeat monitoring | VM and Application Monitoring |
Admission Control
| Host failures cluster tolerates |
1 (minimum) |
|
Define host failover capacity by |
Cluster resource Percentage |
|
Override calculated failover capacity |
Enabled
|
|
Reserved failover CPU capacity |
50%
|
|
Reserved failover Memory capacity
|
50% |
|
Reserve Persistent Memory failover capacity
|
Disabled |
|
Override calculated Persistent Memory failover capacity |
Disabled |
|
Performance degradation VMs tolerate |
100% |
Advanced Options
| das.isolationaddress0 | IP Address from Site A |
|
das.isolationaddress1 |
IP Address from Site B |
Recommended vSphere DRS Settings
Automation
| Automation Level | Fully Automated |
|
Migration Threshold |
(3) Default |
Recommended DRS Host Group Configuration
It is recommended to create two DRS Host Groups, a host group which includes all of the ESX hosts from Site A and a host group which includes all of the ESX hosts from Site B.
Recommended VM Group Configuration for the Management Domain
A VM Group should be created for the SDDC Components of the Management Domain which includes the following components:
Recommended VM Overrides for the Management Domain
The power-on order of the SDDC Components of the Management Domain is important. VM Overrides can be used to control the power-on order of these critical VMs. Refer to Shutdown and Startup of VMware Cloud Foundation for the Startup Order