Considerations for Implementing vSphere Metro Storage Cluster (vMSC) on VMware Cloud Foundation 9.x
search cancel

Considerations for Implementing vSphere Metro Storage Cluster (vMSC) on VMware Cloud Foundation 9.x

book

Article ID: 417356

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

This article provides information about deploying vSphere Metro Storage Cluster with VMware Cloud Foundation 9.x.

Environment

VMware Cloud Foundation 9.x

Resolution

What is vMSC?

VMware vSphere® Metro Storage Cluster (vMSC) is a specific storage configuration that is commonly referred to as a stretched storage cluster or metro storage cluster.   It is a single vSphere cluster that spans multiple physical sites.
A vMSC leverages two independent storage systems each located in a separate physical site linked by a high-speed, low-latency network.  The solution can provide a zero recovery point objective (RPO) and near zero recovery time objective (RTO) due to the use of synchronous replication and active-active configuration.

Uniform vs Non-uniform Host Access 
vMSC supports both uniform and non-uniform host access configurations.  In a Uniform host access configuration each ESX host can access storage arrays in both sites as pictured below.



In a non-uniform configuration ESX hosts are configured to access the storage array in the same local site as pictured below.

 

Single-site Storage

It is also possible to configure a vMSC with a single storage array.  In this Use Case, ESX hosts are divided between Site A and Site B; and the storage array is placed in either Site A, Site B or a third site.  The use of a single storage array doesn’t provide the resiliency that most customers require as it does not protect against storage array failure or a network partition when the array is placed in a third site.

Alternative to vMSC

vSAN Stretched Clusters are an attractive alternative to vMSC and offer the following advantages:

  • vSAN is licensed as part of VCF Bundle
  • Automated workflows in VCF for deployment eliminating the need for manual configuration
  • Automated workflows for Day 2 operations such as cluster expansion
  • vSAN Stretched clusters are a fully tested, validated, and supported solution by Broadcom|VMware

 

vMSC Support

In the past, vMSC storage configurations were validated through a mandatory certification program. As of vSphere 6.0 this program was discontinued. Before purchasing, designing or implementing a vSphere Metro Storage Cluster solution please contact your storage vendor to ensure that your solution has been tested with VMware Cloud Foundation 9.x.

Note: Broadcom|VMware does not test any vSphere Metrocluster Storage solutions internally.

IMPORTANT: Many storage vendors have tested compatibility with ESX 9.x but have not completed testing by deploying the full VCF 9.x suite and validating all of the VCF
Day 0 and Day 2 workflows.

 

Technical Requirements and Constraints

  • The initial cluster of the Management Domain has unique requirements that include the following:
    • All ESX hosts in the initial cluster of the Management Domain must use the same L2 networks with the exception of the ESX Management (vmk0) port.
    • The only network configuration supported by VCF Installer and SDDC Manager workflows such as Add Host is “Stretch all Layer-2 Networks” see below.
    • If NFS is used as principal storage in the initial cluster of the Management domain, the L2 network for NFS must also be stretched between Sites.
  • Storage connectivity using Fibre Channel or NFSv3 is supported with the VCF Installer, VCF Operations, SDDC Manager, and VCF Import.
  • Storage connectivity using any other storage type is supported with VCF Import when converging or upgrading an existing vSphere environment.
  • The maximum supported network latency between ESX hosts in a vSphere cluster is 10ms round-trip time (RTT).
  • The maximum supported network latency between sites for synchronous replication is defined by the storage vendor, some vendors support up to 10ms (RTT) although many require 5ms (RTT).
  • Storage I/O Control is not supported on a vMSC datastore
  • The Layer-2 network for the Management VMs, such as vCenter and NSX Manager, must be always stretched between Sites for all configurations.
  • Some customers may also choose to stretch the vSAN, vMotion and Host Overlay (TEP) L2 networks.
  • vSphere HA and DRS settings must be manually configured.
  • The minimum and recommended number of ESX hosts for VCF can be found in the VCF 9.0 Design Guide, Stretched Cluster Model Sizing Considerations section.  For a vMSC these numbers are doubled to account for the 2nd Site.
Storage Type

VCF Installer Deployment Model

 

Management Domain (First Cluster)

 

Management Domain (Additional Clusters) or Workload Domain (All Clusters)
NFS,FC Simple

Minimum:
One (1) host in Site A
One (1) host in Site B
Total: 2 ESX hosts

Recommended:
Two (2) hosts in Site A
One (2) hosts in Site B
Total: 4 ESX hosts

Minimum:
One (1) host in Site A
One (1) host in Site B
Total: 2 ESX hosts

Recommended:
Two (2) hosts in Site A
One (2) hosts in Site B
Total: 4 ESX hosts

  Highly Available

Minimum:
Three (3) hosts in Site A
Three (3) hosts in Site B
Total: 6 ESX hosts

Recommended:
Four (4) hosts in Site A
Four (4) hosts in Site B
Total: 8 ESX hosts

 

Not applicable

 

Note: The disadvantage of using the minimum number of hosts vs the recommended number of hosts is that lifecycle management tasks may require VMs to be moved from Site A to Site B.  As an example there are three NSX Managers in a Highly Available deployment, if only three (3) ESX hosts are used in Site A one of the NSX Managers will have to be migrated to Site B during an upgrade.  By using the recommended number of four (4) ESX hosts the NSX Manager could move to another host within the same site during the upgrade which may provide better performance.  

Common Network Configurations


There are several supported Network Configurations. The most common configurations include:


● Stretch Management VM Layer-2 Network Only
● Stretch All Layer-2 Networks


Stretch Management VM Layer-2 Network Only

In this example the only network that is stretched is the Management VM Network which hosts vCenter, NSX Manager, VCF Operations, etc.   This example increases the total number of subnets and VLANs required for the Stretched Cluster configuration however it creates smaller failure domains for these networks.  As an example, a denial-of-service attack or a network broadcast storm could be isolated to the local VLAN. To ensure the fastest recovery time in the event of a failure, ensure that your VMs can be powered-on in Availability Zone 2 without requiring IP Addresses to be reconfigured.  This configuration is not supported for the initial cluster in the Management Domain.

Note: If NFS is used as principal storage, the L2-network for NFS must be stretched between Availability Zones.

Stretch All Layer-2 Networks

In this example all Layer-2 networks are stretched between Availability Zones. This example reduces the total number of subnets required for the Stretched Cluster
configuration. To ensure the fastest recovery time in the event of a failure, ensure that your VMs can be powered-on in Availability Zone 2 without requiring IP Addresses to be
reconfigured.

Note: The ESX Mgmt Network (vmk0) VLAN:1610 in the Stretch All Networks example above does not need to be stretched. Some customers may wish to use a unique
network for ESX Mgmt as pictured in the Stretch Management VM Layer-2 Network example.

 

Summary of Deployment Steps

VMware Cloud Foundation is unaware of vMSC and treats it the same as a vSphere cluster. As a result the following manual steps will need to be performed.

  1. Follow Storage Vendors instructions for configuring ESX host connectivity to the storage array.
  2. Create a cluster in VMware Cloud Foundation using the VCF Installer, VCF Operations, SDDC Manager or VCF Import.
  3. Configure vSphere HA Settings
  4. Configure vSphere DRS Settings
  5. Configure DRS Host Groups
  6. Configure DRS VM Groups
  7. Create VM Host Rules
  8. Configure VM Overrides (Restart Priority)

Recommended vSphere HA Settings
The following settings are general recommendations.  Consult your storage vendor or review your storage solution documentation to determine if these settings are applicable.

Failures and responses

Enable Host Monitoring Enabled

Host Failure Response - Failure Response

Restart VMs

Response for Host Isolation

Power off and restart VMs

 

Datastore with PDL

Power off and restart VMs

 

Datastore with APD

Power off and restart VMs (Conservative)

VM Monitoring - Enable heartbeat monitoring VM and Application Monitoring

Admission Control

Host failures cluster tolerates

1 (minimum)

Define host failover capacity by

Cluster resource Percentage

Override calculated failover capacity

Enabled

 

Reserved failover CPU capacity

50%

 

Reserved failover Memory capacity

 

50%

Reserve Persistent Memory failover capacity

 

Disabled

Override calculated Persistent Memory failover capacity

Disabled

Performance degradation VMs tolerate

100%


Advanced Options

das.isolationaddress0 IP Address from Site A

das.isolationaddress1

IP Address from Site B

 

Recommended vSphere DRS Settings

Automation

Automation Level Fully Automated

Migration Threshold

(3) Default

 

Recommended DRS Host Group Configuration

It is recommended to create two DRS Host Groups, a host group which includes all of the ESX hosts from Site A and a host group which includes all of the ESX hosts from Site B.

 

Recommended VM Group Configuration for the Management Domain

A VM Group should be created for the SDDC Components of the Management Domain which includes the following components:

  • vCenter
  • SDDC Manager
  • NSX Manager
  • NSX Edge
  • VMware Live Site Recovery
  • VCF Operations
  • VCF Operations fleet management
  • VCF Identity Broker
  • VCF Operations for logs
  • VCF Operations collector
  • VCF Operations for Networks
  • VCF Automation

Recommended VM Overrides for the Management Domain


The power-on order of the SDDC Components of the Management Domain is important.  VM Overrides can be used to control the power-on order of these critical VMs.  Refer to Shutdown and Startup of VMware Cloud Foundation for the Startup Order