EMC VPLEX is a federated solution that provides simultaneous access to storage devices at two geographically separate sites. One or more VPLEX Distributed Virtual Volumes can be provisioned for sharing between the two site's ESXi hosts. These volumes can be used as Raw Device Mapping (RDM) disks or as a shared VMFS datastore. The RDM can be used for exclusive access by the virtual machine and the VMFS datastore can be used for provisioning virtual machines and carving out additional vDisks.
The VPLEX cluster at each site itself is designed to be highly available. A VPLEX cluster can scale from two directors to eight directors. Each director is protected by redundant power supplies, fans, and interconnects, making the VPLEX highly resilient.
vSphere Metro Storage Cluster (vMSC) is new configuration. A storage device configured in the MSC configuration is supported after vMSC certification equivalency approval from VMWare. All supported storage devices are listed on the VMware Storage Compatibility Guide .
Notes:
For information on any additional requirements for VPLEX Distributed Virtual Volumes, see the VPLEX™ Overview and General Best Practice.
Note: The preceding links were correct as of August 10, 2016. If you find a link is broken, provide feedback and a VMware employee will update the link.
This diagram provides an example:
Scenario |
VPLEX Behavior |
Impact/Observed VMware HA Behavior |
Single VPLEX back-end (BE) path failure |
VPLEX continues to operate using an alternate path to the same BE Array. Distributed Virtual Volumes exposed to the ESXi hosts have no impact. |
None. |
Single front-end (FE) path failure |
The ESXi server is expected to use alternate paths to the Distributed Virtual Volumes. |
None. |
BE Array failure at site-A |
VPLEX continues to operate using the array at site-B. When the array is recovered from the failure, the storage volume at site-A is resynchronized from site-B automatically. |
None. |
BE array failure at site-B |
VPLEX continues to operate using the array at site-A. When the array is recovered from the failure, the storage volume at site-B is resynchronized from site-A automatically. |
None. |
VPLEX director failure |
VPLEX continues to provide access to the Distributed Virtual Volume through other directors on the same VPLEX cluster. |
None. |
Complete site-A failure
(The failure includes all ESXi hosts and the VPLEX cluster at site-A.)
|
VPLEX continues to serve I/O on the surviving site (site-B). When the VPLEX at the failed site (site-A) is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-B). |
Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site. There is no down time if you configure FT on the virtual machines. |
Complete site-B failure (The failure includes all ESXi hosts and the VPLEX cluster at site-B.) |
VPLEX continues to serve I/O on the surviving site (site-A). When the VPLEX at site-B is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-A). |
Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site. There is no down time if you configure FT on the virtual machines. |
Multiple ESXi host
failure(s) – Power off |
None. |
VMware HA restarts the virtual machines on any of the surviving ESXi hosts within the VMware HA Cluster. |
Multiple ESXi host
failure(s) – Network disconnect |
None. |
HA continues to exchange cluster heartbeat through the shared datastore. No virtual machine failovers occur. |
ESXi host experiences APD (All Paths down) –
Encountered when the ESXi host loses access to its storage volumes (in this case, VPLEX Volumes).
|
None. |
In an APD (All Paths Down) scenario, the ESXi host must be rebooted to recover. If the ESXi Server is restarted, this will cause VMware HA to restart the failed virtual machines on other surviving ESXi Servers within the VMware HA cluster. |
VPLEX inter-site link failure; vSphere cluster management network intact |
VPLEX transitions Distributed Virtual Volumes on the non-preferred site to the I/O failure state. On the preferred site, the Distributed Virtual Volumes continue to provide access. |
Virtual machines running in preferred site are not impacted. Virtual machines running in non-preferred site experience I/O failure and show a PDL error. HA fails over these virtual machines on the other site.
In a uniform host access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site.
|
VPLEX cluster failure
(The VPLEX at either site-A or site-B has failed, but ESXi and other LAN/WAN/SAN components are intact.)
|
The I/O continues to be served on all the volumes on the surviving site. |
The ESXi hosts located at the failed site experience an APD condition. The ESXi hosts needs to be rebooted to recover from the failure.
In a uniform host access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site.
|
Complete dual site failure |
Upon restoration of the two sites, the VPLEX continues to serve I/O. The best practice is to bring up the BE storage arrays first, followed by VPLEX. |
All virtual machines fail since both sites are down.
The ESXi hosts should be brought up only after the VPLEX is fully recovered and the Distributed Virtual Volumes are synchronized.
On powering on the ESXi hosts at each site, the virtual machines are restarted and resume normal operations.
The same impact occurs in a uniform hosts access configuration since both sites are down.
|
Director failure at one site
(preferred site for a given Distributed Virtual Volume) and BE array failure at the other site (secondary site for a given Distributed Virtual Volume)
|
The surviving VPLEX directors within the VPLEX cluster with the failed director continue to provide access to the Distributed Virtual Volumes. VPLEX continues to provide access to the Distributed Virtual Volumes using the preferred site BE array. |
None. |
VPLEX inter-site link intact; vSphere cluster management network failure |
None. |
Virtual machines on each site continue running on their respective hosts since the HA cluster heartbeats are exchanged through the shared datastore. |
VPLEX inter-site link failure; vSphere cluster management network failure |
VPLEX fails I/O on the non-preferred site for a given Distributed Virtual Volume. The volumes continue to have access on the Distributed Virtual Volume on its preferred site. |
For virtual machines running in preferred site, powered-on virtual machines continue to run. This is an HA split-brain situation. The non-preferred site thinks that the hosts of the preferred site are dead and tries to restart the powered-on virtual machines of the preferred site. For virtual machines running in a non-preferred site, these virtual machines see their I/O as failed and the virtual machines fail. These virtual machines can be registered and restarted on the preferred site.
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site. The HA heartbeats are exchanged through the datastore.
|
VPLEX Storage volume is unavailable (for example, it is accidentally removed from the storage view or the ESXi initiators are accidentally removed from the storage view) |
VPLEX continues to serve I/O on the other site where the Volume is available. |
If the I/O is running on the lost device, ESXi detects a PDL (Permanent Device Loss) condition. The virtual machine is halted by virtual machine monitor and restarted by HA on the other site. |
VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-B link failure |
The VPLEX fails I/O on the Distributed Virtual Volumes at site-B and continue to serve I/O on site-A. |
It has been observed that the virtual machines at the site-B fail. They can be restarted at site-A .
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi hosts at Site-B can still access the distributed volume through Site-A.
|
VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-A link failure |
The VPLEX fails I/O on the Distributed Virtual Volumes at site-A and continues to serve I/O on site-B. |
It has been observed that the virtual machines at the site-A fail. They can be restarted at site-B.
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi hosts at Site-A can still access the distributed volume through Site-B.
|
VPLEX Cluster Witness failure |
VPLEX continues to serve I/O at both sites. |
None. |
VPLEX Management Server failure |
None. |
None. |
vCenter Server failure |
None |
No impact to the running virtual machines or HA. However, the DRS rules and virtual machine placements are not in effect. |
For more information, see: