ActiveCluster VMware vMSC
search cancel

ActiveCluster VMware vMSC

book

Article ID: 330154

calendar_today

Updated On: 06-08-2020

Products

VMware vCenter Server

Issue/Introduction

This article provides information on ActiveCluster VMware vMSC

Resolution

What is vMSC?

vSphere Metro Storage Cluster (vMSC) is a partner supported high availability solution that combines array based synchronous replication and vSphere capabilities such as VMware HA clusters. For more information about vMSC see the VMware Partner Verified and Supported Products page.

What is ActiveCluster?

Pure Storage® ActiveCluster is a fully symmetric active/active bidirectional replication solution that provides synchronous replication for RPO zero and automatic transparent failover for RTO zero. ActiveCluster spans multiple sites enabling clustered arrays and clustered ESXi hosts to be used to deploy flexible active/active datacenter configurations.



Symmetric Active/Active - Read and write to the same volumes at either side of the mirror, with optional host-to-array site awareness.

Transparent Failover with Preferences - Automatic Non-disruptive failover between synchronously replicating arrays and sites with automatic resynchronization and recovery.

Active-Active Asynchronous Replication - Integrated Target-Orchestrated asynchronous replication provides resilient and dramatically simplified out of region (3rd site) data protection. 

No Bolt-ons & No Licenses - No additional hardware required, no costly software licenses required, just upgrade the Purity Operating Environment and go active/active!

Simple Management - Perform data management operations from either side of the mirror, provision storage, connect hosts, create snapshots, create clones.

Integrated Pure1® Cloud Mediator - Automatically configured passive mediator that allows transparent failover and prevents split-brain, without the need to deploy and manage another component.

Core Components

Purity  ActiveCluster is composed of three core components: The Pure1 Mediator, active/active clustered array pairs, and stretched storage containers.


The Pure1 Cloud Mediator - A required component of the solution that is used to determine which array will continue data services should an outage occur in the environment. An on-premises mediator VM is also available. 

Active/Active Clustered FlashArrays - Utilize synchronous replication to maintain a copy of data on each array and present those as one consistent copy to hosts that are attached to either, or both, arrays.

Stretched Storage Containers - Management containers that collect storage objects such as volumes into groups that are stretched between two arrays. Stretched storage containers also provide consistent IO continuation behavior for the storage objects within them.

Requirements

The following items are required to use Pure Storage ActiveCluster.

Solution Component Requirements

  • Two Pure Storage FlashArrays running Purity version 5.0.0 or higher (Note: Active-Active Asynchronous configurations require Purity version 5.2.0 or higher and at least 3 FlashArrays).
  • The Pure1 Cloud Mediator or the on-premises mediator installed in a third site, in a separate failure domain from both of the array sites.
  • VMware HA ESXi host protection for the mediator if using the on-premises mediator.


Replication Network Requirements

The replication network is used for the initial asynchronous transfer of data to stretch a pod, to synchronously transfer data and configuration information between the arrays, and  to resynchronize a pod.
  • Maximum 11ms round trip latency between clustered FlashArrays.
  • 4 10GbE replication ports per array (two per controller). Two replication ports per controller are required to ensure redundant access from the primary controller to the other array.
  • 4 dedicated replication IP addresses per array.
  • A redundant switched replication network. Direct connecting FlashArrays for replication is not possible.
  • Adequate bandwidth between arrays to support bi-directional synchronous writes and bandwidth for resynchronizing. This depends on the write rate of the hosts at both sites.
  • Active-Active Asynchronous Replication requires one connection from each ActiveCluster FlashArray [2 total] to the target FlashArray.
Management Network Requirements

The management network is necessary to connect the arrays and to connect to the mediator.  A temporary outage of the management network alone does not cause a failure.
  • 4 1GbE management ports per array (two per controller). Two management ports per controller are required to ensure redundant access from the primary controller to the mediator.
  • A minimum of 5 management IP addresses per array.  These are: 
    • 1 IP address for vir0 
    • 2 physical ethernet port IP addresses configured to support Vir0 and to be connected to the first management network. 
    • 2 physical port IP addresses for connection to the second management network. These are not required to be configured under a virtual interface.
Note: For ease manageability, a second virtual interface, Vir1, can be configured to use the two IP addresses on the second management network provided a 6th IP is available on the second management network. 
  • Independent management network access from both array sites to the mediator, such that no single network outage can prevent both arrays from accessing the mediator. 
Note:  Some multi-site networks connect to the internet through just one of the two sites. This is known as a backhauled internet design. Backhauled designs are inherently a single point of failure from a Pure1 Cloud Mediator access perspective. 

For network configurations such as these, the on-premises mediator should be used. The same requirements still apply for the on-premises mediator: 
  • Independent management network access to the mediator VM from each FlashArray 
  • HA protection of the mediator VM 
  • 3rd fault domain (isolated 3rd site) deployment of the mediator VM

Deployment Options

The following section describes basic connectivity examples for array-to-host connections. Purity ActiveCluster supports both uniform storage access and non-uniform storage access.

Uniform Storage Connectivity

A uniform storage access model can be used in environments where there is host-to-array connectivity of either FC or ethernet (for iSCSI), and array-to-array ethernet connectivity, between the two sites. When deployed in this way a host has access to the same volume through both the local array and the remote array. The solution supports connecting arrays with up to 11ms of round-trip time (RTT) latency between the arrays.

The image above represents the logical paths that exist between the hosts and arrays, and the replication connection between the two arrays in a uniform access model. Because a uniform storage access model allows all hosts, regardless of site location, to access both arrays there will be paths with different latency characteristics. Paths from hosts to the local array will have lower latency; paths from each local host to the remote array will have higher latency. 

Optimizing Performance in Uniform Access Environments

For the best performance in active/active synchronous replication environments, hosts should be prevented from using paths that access the remote array unless necessary. For example, in the image below if VM 2A were to perform a write to volume A over the host side connection to array A, that write would incur 2X the latency of the inter site link, 1X for each traverse of the network. The write would Ó Pure Storage 2019 | 9 experience 11ms of latency for the trip from host B to array A and experience another 11ms of latency while array A synchronously sends 

In other metro storage clustering (MSC) solutions, the continuous management of this difference in latency can be a challenge. Most Non-Pure MSC solutions make use of per-volume Asymmetric Logical Unit Access (ALUA) which is a mechanism that allows a storage array to advertise path priorities to a host. The host can then distribute it’s I/O on optimized paths and avoid sending I/O on non-optimized paths. When using other non-Pure solutions storage administrators must take care to make sure VMs run only on hosts that have local optimized access to the volume in which that VMs data exists. Performing a migration of a VM to a host in the other site can cause that VM to experience 2X the latency between the sites for each write it performs if the new host does not have local optimized access to that volume, as described earlier. To remedy this the VM’s data must be migrated to a different volume that allows local optimized access to the VM’s new host. This is because other non-Pure MSC solutions manage optimized paths on a per volume basis where each volume can only have optimized paths on one array or the other, but not both at the same time. With Pure Storage Purity ActiveCluster there are no such management headaches. ActiveCluster does make use of ALUA protocol to expose paths to local hosts as active/optimized paths and expose paths to remote hosts as active/non-optimized. However, there are two advantages in the ActiveCluster implementation. 
  1. In ActiveCluster volumes in stretched pods are read/write on both arrays. There is no such thing as a passive volume that cannot service both reads and writes. 
  2. The optimized path is defined on a per host-to-volume connection basis using a preferred array option; this ensures that regardless of what host a VM or application is running on it will have a local optimized path to that volume.
With ActiveCluster you may configure truly active/active datacenters and do not have to care what site or host a VM runs on; the VM will always have the same performance regardless of site. While a VM 1A is running on host A accessing volume A it will use only the local optimized paths as shown in the next image.

If the VM or application is switched to a host in the other site, with the data left in place, only local paths will be used in the other site as shown in the next image. There is no need to adjust path priorities or migrate the data to a ifferent volume to ensure local optimized access.

Non-Uniform Storage Connectivity

A non-uniform storage access model is used in environments where there is host-to-array connectivity of either FC or ethernet (for iSCSI) only locally within the same site. Ethernet connectivity for the array-toarray replication interconnect must still exist between the two sites. When deployed in this way each host has access to a volume only through the local array and not the remote array. The solution supports connecting arrays with up to 11ms of round-trip time (RTT) latency between the arrays.

Hosts will distribute I/Os across all paths to the storage only, because only the local Active/Optimized paths are available. 

Failover Scenarios

The behavior of the environment during some failure events differs depending on the host access configuration, uniform or non-uniform. In a uniform storage access configuration hosts may simply experience the loss of some storage paths; there is no storage failover process. In a non-uniform storage access configuration there is again no storage failover process, however the VMs running on ESXi hosts connected to the offline array will be restarted on other ESXi hosts by VMware HA. 

Note: ActiveCluster uses storage containers called pods to define which volumes are to be synchronously replicated between storage arrays. This is referred to as stretching a pod. Volumes that are not in stretched pods remain online through failure scenarios where the array remains online.

How Transparent Failover Occurs

Failover is automatic and no storage administrator intervention is necessary to perform a failover with ActiveCluster.
An automatic failover requires at least one array with access to the mediator and may be triggered by any of the following:
  • Failure of an array.
  • Failure of the replication link between the two arrays.
  • Failure of an entire array site, a site-wide disaster.
A complete failure of the storage network in one site may be said to be a failover but it is not a storage failover. In such a case both arrays are able to service I/O but access is only available for ESXi hosts in the site where the storage network is still online.

Failover Preferences

Active-Active datacenters often have workloads that tend to run at one site or at the other. The site that applications tend to run in may be determined by historical convention, administrative convenience, infrastructure disparity, primary user location, or any of a number of other reasons. When ActiveCluster performs a mediator race following the loss of the synchronous replication links between datacenters, the outcome of the resulting race to the mediator can be unpredictable. For non-uniform host connectivity, the lack of mediator race predictability can mean a disruptive restart for applications running on stretched pod volumes in the losing FlashArray. ActiveCluster provides a failover preference feature which allows the mediator race to be administratively influenced on a per pod basis. This capability enables administrators to align pod failover behavior with the site where each application tends to run. Functionally, the failover preference feature gives the preferred FlashArray for each pod a 6 second head start in its race to the mediator.
In the illustration above, the stretched pod containing volume A prefers Site A (preference is indicated by the orange letter P at the top left corner of the A pod). Due to the head start it is given, Pod A is more likely to win its race to the mediator and stay online at Site A. The second pod with volume B prefers site B (preference is indicated by the orange letter P at the top right corner of the B pod). Pod B is more likely to stay online at Site B after winning its race to the mediator. In this way, site and array alignment with applications can be established allowing hosts that use non-uniform connectivity to continue without the need for disruptive restart. 

As described earlier, non-uniform host connectivity hosts are connected to just one FlashArray. If that one array suspends stretched pod volumes, the applications running on then must be disruptively restarted by host cluster software on hosts that have connections to the remote FlashArray. By setting a failover preference for Site A or at Site B, the ActiveCluster administrator can mitigate against a disruptive restart of her applications. Disruptive restarts will be confined to cases where one FlashArray is offline, unreachable, or if there is an entire site loss. Setting a failover preference for pods supporting clustered applications running on non-uniformly connected hosts is, therefore, a recommended best practice. 

The key advantage with using a failover preference setting compared to allowing a fixed site bias is that if the preferred array fails (or if the site is lost), the non-preferred array can still win the race to the mediator and keep all pod volumes online. A static, fixed, bias can lead to a total cluster outage as the nonpreferred array must suspend IO regardless of what happens to the preferred array.

Pre-Election

Starting with Purity 5.3, ActiveCluster uses its built-in Mediator polling mechanism to allow both FlashArrays to jointly agree a mediation race winner for each stretched pod if both arrays are unable to reach the mediator. Pre-Election also makes use of the pod failover preference setting (if set) to determine the winning FlashArray for each pod. If no pod failover preference was set, a winner will be selected automatically. Note: The Mediator polling cycle on each FlashArray is independent. As a result, the maximum amount of time to identify the loss or the return of the Mediator by both arrays can be up to 5 minutes. This means that if the Pure1 Cloud Mediator (or the On-Premises Mediator) is offline or unreachable for an extended period of time, ActiveCluster can still provide access to stretched pod volumes if the replication links should subsequently fail. Pre-Election automatically disengages when one or both of the FlashArrays re-establishes contact with the Mediator. Pre-Election is always on and it requires no administrative setup, no ongoing monitoring, and no upgrade to the Mediator (cloud or on-prem). For more information, see the Access to Storage Through Failures section below.

How to Resynchronize and Recover

Resynchronization and recovery is automatic and no storage administrator intervention is necessary to resynchronize and recover replication with ActiveCluster.

Internal checkpoints are created periodically that provide a known in-sync state from which the arrays can automatically resynchronize. When the connection between arrays is restored any changes made since the outage on the array that kept the stretched pod online will be asynchronously transferred to the other array. The arrays will get in sync via shorter and shorter periodic asynchronous background transfers. Once the arrays are nearly in sync they will smoothly transition to synchronous replication mode and the data paths to the offline side of the pod will be automatically restored, allowing the ESXi hosts to perform I/O through both arrays again.

Access to Storage Through Failures

The table below describes whether or not ActiveCluster is able to service I/O for volumes that are configured for synchronous replication, on one array or the other, when a component outage occurs. Note that 2 of the 3 main components (array A, array B, mediator) must be online and accessible for IO service to continue on the surviving array. Replication link failure or array failure, while the mediator is unavailable, results in a stop of I/O to sync replicated volumes on both arrays to prevent split brain. The mediator is a required component of the solution.
 

Solution Component Failure

 

Access to Storage

One
Array

Other Array

Replication

Link

Mediator

UP

UP

UP

UP

Available on both arrays

UP

DOWN

UP

UP

Available on one array

UP

UP

DOWN

UP

Available on one array

UP

UP

UP

DOWN

Available on both arrays

UP

DOWN

DOWN

UP

Available on one array

UP

UP

DOWN

DOWN*

Unavailable

UP

DOWN

UP

DOWN*

Unavailable

UP

UP

Down

DOWN > 5 min

Available on one array

UP (pre-election winner)

DOWN

UP

DOWN > 5 min

Available on pre-elected array

* These rows refer to simultaneous failures of other components while the mediator is unavailable.  If the mediator becomes unavailable after an array failure or a replication link failure has already been sustained, then access to storage will remain available on one array as the mediator is not required to resolve the situation. 
* If the mediator becomes unavailable to both arrays for at least 5 minutes prior to the replication network links failing then the pre-elected array will keep stretched pod volumes online. 

Failure Scenario Behaviors

The following table describes the effect of various failure events in the environment. In this table the following terminology is used.

Mediator - Refers to the Pure1 Cloud Mediator or the on premises VM mediator. In either case the mediator is located at a third site outside of the failure domain of either array’s site. The failover behaviors are the same regardless of which mediator is used.

Mediator winner - The array that is first to reach the mediator and therefore keep data services online for volume in stretched pods. 

Mediator loser - The array that is unable to reach the mediator, or is 2nd to reach the mediator, and therefore must turn data services offline for volumes in stretched pods.

FlashArray Component, Replication Network, and Site Failures

 Failure Scenario

FlashArray Behavior & VMware HA/ESXi Host Behavior

Single path failure to storage array

No effect. Multipath software in the ESXi host manages failover to other paths.

Local controller failover in one array.

After a short pause for the duration of the local controller failover, host I/O will continue to both arrays without losing RPO-Zero.

Replication link failure.

After a short pause, host IO continues to volumes only on the array that contacts the mediator first. This is per pod.

Failover is automatic and transparent and no administrator intervention is necessary.

Uniform connected hosts:

     After a short pause in IO, continue IO to the array that won the race to the mediator. 

     experience some storage path failures for paths to the array that lost the race to the mediator. 

     in the mediator losing site will maintain access to volumes remotely across stretched SAN to the mediator winning site.

Non-uniform connected hosts:

     in the mediator winning site will maintain access to volumes with no more than a pause in IO. 

     in the mediator losing site will experience total loss of access to volumes.

     VMware HA will restart failed VMs on a ESXi host in the mediator winning site.

Entire single array failure.

After a short pause, host IO automatically continues on the surviving array.

Failover is automatic and transparent and no administrator intervention is possible or necessary.

Uniform connected hosts:

     in the surviving array site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator. 

     experience some storage path failures for paths to the failed array.

     in the site where the array failed will do IO to volumes remotely across the stretched SAN to the surviving array.

Non-uniform connected hosts:

     in the surviving array site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator. 

     in the failed array site will experience total loss of access to volumes.

     VMware HA will restart failed VMs on a ESXi host connected to the surviving array.

Entire site failure

After a short pause, host IO automatically continues on the surviving array.

Failover of the array is automatic & transparent and no administrator intervention is possible or necessary.

Uniform connected hosts:

     in the surviving array site, after a short pause in IO, VMs continue IO to the surviving array that was able to reach the mediator.

     experience some storage path failures for paths to the array in the failed site.

     VMware HA will restart failed VMs on a ESXi host in the surviving site.

Non-uniform connected hosts:

     in the surviving site, after a short pause in IO, continue IO to the surviving array that was able to reach the mediator.

     in the surviving site will maintain access to local volumes with no more than a pause in IO. 

     VMware HA will restart failed VMs on a ESXi host in the surviving site.

Entire site failure and restore of both sites

FlashArrays reconnect and automatically resynchronize. One array serves data until resynchronization completes.

Mediator failure or access to mediator fails.

No effect. Host IO continues through all paths on both arrays as normal.

Mediator failure less than 5 minutes before  a second failure of replication link, or failure of one array, or failure of one site.

(2nd failure occurs while mediator is unavailable)

Host IO access is lost to sync rep volumes on both arrays if the Mediator is lost shortly ( up to 5 min) before subsequent replication network loss.

This is a double failure scenario; data service is not maintained through failure of either array if the mediator is unavailable immediately (up to 5 min) before replication network is lost..

Options to recover:

  1. Restore access to either the mediator or replication interconnect and the volumes will automatically come back online, as per above scenarios.
  2. Clone the pod which creates new volumes with different LUN serial numbers. New LUN serial numbers will prevent hosts from automatically connecting to and using the volumes, avoiding split brain. Then re-identify and reconnect all LUNs on all hosts.

Mediator failure more than 5 minutes before a second failure of replication link, or failure of one array, or failure of one site.

(2nd failure occurs while mediator has been unavailable for more than 5 minutes)

Host IO access is maintained to sync rep volumes on  the pre-elected arrays.  This assumes the pre-elected array is not also destroyed or lost in the second failure.   If the mediator is lost for more than 5 minutes and the pre-elected array is destroyed then data service is not maintained on the surviving FlashArray. 

Options to recover:

  1. Restore access to either the mediator or replication interconnect and the volumes will automatically come back online, as per above scenarios.
  2. Clone the pod which creates new volumes with different LUN serial numbers. New LUN serial numbers will prevent hosts from automatically connecting to and using the volumes, avoiding split brain. Then re-identify and reconnect all LUNs on all hosts.

Host and Storage Network Failures

 Failure Scenario

Failure Behavior

Single or multiple ESXi host failure

VMware HA automatically restarts VMs on surviving ESXi hosts in the same site or in the other site.

ESXi host management network failure

Depending on VMware HA host isolation response setting, VMs either remain running on ESXi hosts or are restarted on other ESXi hosts.

vCenter Server Failure

No effect.

Stretched SAN fabric outage (FC or iSCSI)

(failure of SAN interconnect between sites)

Host IO automatically continues on local paths in the local site.

Uniform connected hosts:

     experience some storage path failures for paths to the remote array and continue IO on paths to the local array.

     in each site will maintain access to local volumes with no more than a pause in IO.

Non-uniform connected hosts:

     do not have a SAN interconnect between sites, so this scenario is not applicable.

SAN fabric outage in one site

VMware HA automatically restarts VMs on surviving ESXi hosts in the same site or in other the site.

Uniform connected hosts:

     in the site without the SAN outage, experience some storage path failures for paths to the remote array and continue IO on paths to the local array.

     in the site with the SAN outage, will experience total loss of access to volumes and applications must failover to the other site as mentioned above.

Non-uniform connected hosts:

     in the site without the SAN outage, will maintain access to local volumes.

     in the site with the SAN outage, will experience total loss of access to volumes and applications must failover to the other site as mentioned above.

Additional Information

The Pure Storage FlashArray Storage Replication Adapter (SRA) is a plugin for VMware Site Recovery Manager (SRM).  ActiveCluster is supported in v3.0 and above.  Asynchronous replication from an ActiveCluster pod is supported in version 3.1 and above. Visit the Pure Storage SRA for SRM Release Notes page for more information.

For an explanation of how Purity ActiveCluster works, check out the following video.
https://www.youtube.com/watch?v=g5BVVjPyVMA&t

For more detailed information about ActiveCluster behaviors and requirements see the ActiveCluster Solution Overview.
https://support.purestorage.com/FlashArray/PurityFA/Protect/Replication/ActiveCluster_Solution_Overview

VMware Cloud Foundation has been tested with Active Cluster using the VMFS on FC Principle Storage option as well as using both Fibre Channel and iSCSI as Supplementary Storage for Workload Domains that have already been deployed.  For more information on this and other ways you can use VMware Cloud Foundation and Pure Storage together, visit this link:
https://support.purestorage.com/Solutions/VMware_Platform_Guide/VMware_Cloud_Foundation

Additional Information

简体中文:ActiveCluster VMware vMSC