This article provides information about deploying a Metro Storage Cluster across two data centers using Dell EMC metro node 7.0.1 and above. With vSphere 7.x, Storage Virtualization Device can be supported in a Metro Storage cluster configuration.
Dell EMC metro node array enable automated business continuity with zero RPO and RTO. True active-active synchronous replication over metro distances with multi-site dual access gives organizations full confidence that their data will always be available and accessible and there is no time associated with recovery.
Metro node provides greater flexibility through multi-platform support, workload granularity and replication to any array. There is zero performance overhead, no duplicate capacity on the array and no additional software required on the host. The VM witness technology provides the ability to automatically initiate an instant site failover. Metro node supports local configurations for continuous application availability, data mobility to non-disruptively relocate workloads and enables storage technology refresh without application downtime.
vSphere Metro Storage Cluster (vMSC) is new configuration. A storage device configured in the MSC configuration is supported after vMSC certification equivalency approval from VMWare. All supported storage devices are listed on the VMware Storage Compatibility Guide .
Metro node Witness
The metro node Witness enables the metro node solution to improve overall environment availability by arbitrating a pure communication failure between two primary sites, and an actual site failure in a multi-site architecture.
For metro node 7.0.1 and later, the systems can now rely on a component that is known as metro node Witness. The Witness is an optional component that is designed to be deployed in the customer environments where regular preference rule sets are insufficient to provide seamless zero or near-zero RTO storage availability in the presence of site disasters, metro node cluster and inter-cluster failures.
Notes:
The ESXi hosts forming the VMware HA cluster can be distributed on two sites. HA Clusters can start a virtual machine on the surviving ESXi host, and the ESXi hosts access the Distributed Virtual Volume through storage path at its site.
Metro node 7.0.1 and later version and ESXi 7.0 are tested in this configuration with the metro node Witness.
A VMware HA/DRS cluster is created across the two sites using ESXi 7.0 hosts and managed by vCenter Server 7.0. The vSphere Management, vMotion, and virtual machine networks are connected using redundant a network between the two sites. It is assumed that the vCenter Server managing the HA/DRS cluster can connect to the ESXi hosts at both sites.
This diagram provides an overview:
Based on the host SAN connections to the metro node storage cluster, there are two different types of deployments possible:
Non-uniform Host Access – This type of deployment involves the hosts at either site see the storage volumes through the same site storage cluster only.
This diagram provides an example:
Uniform Host Access (Cross-Connect) – This deployment involves establishing a front-end SAN across the two sites, so that the hosts at one site could see the storage cluster at the same site as well as the other site.
These best practices must be performed for this type of deployment:
This diagram provides an example:
A metro node Metro solution federated across the two data centers provides the distributed storage to the ESXi hosts. It is assumed that the ESXi boot disk is located on the internal drives specific to the hosts and not on the Distributed Virtual Volume itself.
The virtual machine is ideally run on the preferred site of the Distributed Virtual Volume.
This table outlines tested scenarios:
Scenario |
Metro node Behavior |
Impact/Observed VMware HA Behavior |
Single metro node back-end (BE) path failure |
Metro node continues to operate using an alternate path to the same BE Array. Distributed Virtual Volumes exposed to the ESXi hosts have no impact. |
None. |
Single front-end (FE) path failure |
The ESXi server is expected to use alternate paths to the Distributed Virtual Volumes. |
None. |
BE Array failure at site-A |
Metro node continues to operate using the array at site-B. When the array is recovered from the failure, the storage volume at site-A is resynchronized from site-B automatically. |
None. |
BE array failure at site-B |
Metro node continues to operate using the array at site-A. When the array is recovered from the failure, the storage volume at site-B is resynchronized from site-A automatically. |
None. |
metro node director failure |
metro node continues to provide access to the Distributed Virtual Volume through other directors on the same metro node cluster. |
None. |
Complete site-A failure |
Metro node continues to serve I/O on the surviving site (site-B). When the metro node at the failed site (site-A) is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-B). |
Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site. There is no down time if you configure FT on the VMs. |
Complete site-B failure |
Metro node continues to serve I/O on the surviving site (site-A). When the metro node at site-B is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-A). |
Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site. There is no down time if you configure FT on the VMs. |
Multiple ESXi host |
None. |
VMware HA restarts the virtual machines on any of the surviving ESXi hosts within the VMware HA Cluster. |
Multiple ESXi host |
None. |
HA continues to exchange cluster heartbeat through the shared datastore. No virtual machine failovers occur. |
ESXi host experiences APD (All Paths down) – |
None. |
In an APD (All Paths Down) scenario, the ESXi host must be rebooted to recover. If the ESXi Server is restarted, this will cause VMware HA to restart the failed virtual machines on other surviving ESXi Servers within the VMware HA cluster. |
Metro node inter-site link failure; vSphere cluster management network intact |
Metro node transitions Distributed Virtual Volumes on the non-preferred site to the I/O failure state. On the preferred site, the Distributed Virtual Volumes continue to provide access. |
Virtual machines running in preferred site are not impacted. |
Metro node cluster failure |
The I/O continues to be served on all the volumes on the surviving site. |
The ESXi hosts located at the failed site experience an APD condition. The ESXi hosts needs to be rebooted to recover from the failure. |
Complete dual site failure |
Upon restoration of the two sites, the metro node continues to serve I/O. The best practice is to bring up the BE storage arrays first, followed by metro node. |
All virtual machines fail since both sites are down. |
Director failure at one site |
The surviving metro node directors within the metro node cluster with the failed director continue to provide access to the Distributed Virtual Volumes. |
None. |
Metro node inter-site link intact; vSphere cluster management network failure |
None. |
Virtual machines on each site continue running on their respective hosts since the HA cluster heartbeats are exchanged through the shared datastore. |
Metro node inter-site link failure; vSphere cluster management network failure |
Metro node fails I/O on the non-preferred site for a given Distributed Virtual Volume. The volumes continue to have access on the Distributed Virtual Volume on its preferred site. |
For virtual machines running in preferred site, powered-on virtual machines continue to run. |
Metro node Storage volume is unavailable (for example, it is accidentally removed from the storage view or the ESXi initiators are accidentally removed from the storage view) |
Metro node continues to serve I/O on the other site where the Volume is available. |
If the I/O is running on the lost device, ESXi detects a PDL (Permanent Device Loss) condition. The virtual machine is killed by virtual machine monitor and restarted by HA on the other site. |
Metro node inter-site WAN link failure and simultaneous Cluster Witness to site-B link failure |
The Metro node fails I/O on the Distributed Virtual Volumes at site-B and continue to serve I/O on site-A. |
It has been observed that the virtual machines at the site-B fail. They can be restarted at site-A . |
Metro node inter-site WAN link failure and simultaneous Cluster Witness to site-A link failure |
The Metro node fails I/O on the Distributed Virtual Volumes at site-A and continues to serve I/O on site-B. |
It has been observed that the virtual machines at the site-A fail. They can be restarted at site-B. |
Metro node Cluster Witness failure |
Metro node continues to serve I/O at both sites. |
None. |
Metro node Management Server failure |
None. |
None. |
vCenter Server failure |
None |
No impact to the running virtual machines or HA. However, the DRS rules and virtual machine placements are not in effect. |