What is vMSC?
vSphere Metro Storage Cluster (vMSC) is a new storage configuration for VMware vSphere environments and recognized with a unique Hardware Compatibility List (HCL) classification. All supported vMSC storage devices available on the VMware Storage Compatibility Guide.
What is IBM SAN Volume Controller?
A stretched IBM SAN Volume Controller configuration in conjunction with VMware vSphere enables transparent VMware vMotion migration and automatic VMware HA failover of virtualized workloads between physical data centers. This image shows a solution overview:
The IBM System Storage SAN volume controller is an enterprise class storage virtualization system that enables a single point of control for aggregated storage resources. SAN Volume Controller consolidates capacity from different storage systems, both IBM and non-IBM branded, while enabling common copy functions, non-disruptive data movement, and improving performance and availability. IBM SAN Volume Controller combines hardware and software into an integrated, modular solution that forms a highly scalable cluster. An IBM SAN Volume Controller cluster is commonly installed to provide service in a single data center. An IBM SAN Volume Controller cluster can also be installed in a “stretched” configuration, where a single SAN Volume Controller cluster can provide service to two data centers at a maximum distance of 300km. In a stretched configuration IBM SAN Volume Controller enables a highly available stretched volume to be concurrently accessed at both data center locations.
What is the IBM SAN Volume Controller Quorum disk?
A SAN Volume Controller cluster quorum disk is a managed disk (MDisk) or managed drive that contains a reserved area that is used exclusively for cluster management. The cluster maintains one active quorum and two quorum candidates (or backup quorums). A cluster uses the quorum disk for two purposes:
- To break a tie when a SAN fault occurs, when exactly half of the nodes that were previously a member of the cluster are present.
- To hold a copy of important cluster configuration data. Just over 256 MB is reserved for this purpose on each quorum disk.
A Stretched SAN Volume Controller cluster typically maintains the active quorum disk at a third site to ensure cluster availability is not impacted by a failure of either primary site.
Configuration Requirements
These configuration requirements must be fulfilled to support VMware HA, DRS, and vMotion functions between data centers with a stretched SVC cluster:
- VMware vCenter server must be able to connect to vSphere hosts at both locations.
- An IP network with a minimum bandwidth of 622 Mbps for vSphere hosts participating in vMotion.
- Maximum latency of 5 milliseconds (ms) between hosts participating in vMotion, or 10ms between hosts participating in enhanced vMotion.
- Source and destination vSphere hosts must have a network interface on the same IP subnet and broadcast domain.
- The same IP network on which the virtual machines reside must be accessible to vSphere hosts at both data center locations.
- Datastores on which the virtual machine boot drives reside must be accessible to vSphere hosts at both data center locations.
- The maximum number of vSphere hosts in a HA enabled cluster must not exceed 32.
- IBM SAN Volume controller cluster running software code 7.7.0 or above and configured in a supported hardware configuration as outlined in Deployment options for inter-cluster links.
Table1 Metro Cluster Software Components
Metro Cluster Software Components | Version |
IBM SVC Stretched Cluster | 7.7.0 or newer |
VMware vSphere | 6.0 or newer |
Deployment options for IBM SAN Volume Controller inter-cluster connections
IBM SAN Volume Controller is categorized as a Uniform Host Access vMSC device. Uniform Host Access (or cross- connect) means that the hosts at each site can access the SAN Volume Controller cluster nodes at the local and remote site through inter-cluster connections. SAN Volume Controller supports two methods for inter-cluster connections:
Node-to-node paths without switch Inter-site links – In this configuration each SAN Volume Controller node is attached directly to the Fibre channel switches in the local and remote sites, and the active quorum disk is attached to Fibre channel switches in both sites. This type of configuration requires the SAN Volume Controller cluster to be running software version 7.7.0 or later, and the maximum support distance is 10km. This image shows this configuration:
Node-to-node paths with switch Inter-site links – In this configuration each SAN Volume Controller node is attached directly to the Fibre channel switches in the local site, and switch Inter-site links provide connectivity to the remote Fibre channel switches. Inter-site links can also be used to connect the active quorum disk to each site. This type of configuration requires the SAN Volume Controller cluster to be running software version 7.7.0 or newer, and the maximum supported distance is 300km. This image shows this configuration:
Supported use cases
This list describes the supported use cases of a stretched SAN Volume Controller cluster and VMware vSphere:
- A stretched SAN Volume Controller cluster presenting an accessible VMware VMFS volume to vSphere hosts at two separate data center locations separated by a distance of up to 300km.
- Single stretched vSphere cluster leveraging VMware HA and DRS functions with hosts at two separate data center locations separated by a distance of up to 300km.
- VMware vMotion between vSphere hosts at two separate data center locations separated by a distance of up to 300km.
- Automatic VMware HA fail over of virtual machines between data centers due to server, storage, or site failure.
Tested Scenarios
This table outlines the tested and supported failure scenarios when using a stretched SAN Volume Controller cluster and VMware vSphere:
Failure Scenario | SAN Volume Controller Behavior | VMware HA Impact |
Path Failure – SAN Volume Controller Back End (BE) Port | Single path failure between SAN Volume Controller node and storage subsystems. No impact to volume mirroring. | No impact |
Path Failure – SAN Volume Controller Front End (FE) Port | Single path failure between SAN Volume Controller node and ESXi host. ESXi host uses alternate paths. | No impact |
BE Array Failure at site-1 | SAN Volume controller cluster continues to operate off of volume copy at site-2. When the array at site-1 is available the volume mirror will synchronize the volume copies. | No impact |
BE Array Failure at site-2 | SAN Volume controller cluster continues to operate off of volume copy at site-1. When the array at site-2 is available the volume mirror will synchronize the volume copies. | No impact |
SAN Volume Controller Node Failure | SAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node. | No impact |
Complete site-1 failure
(The failure includes all ESXi hosts and the SAN Volume Controller nodes at site-1). | SAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node. When the node at site-1 is restored, the volume mirror will is restarted and volumes are synchronized. | Virtual machines running on ESXi hosts at the failed site are impacted. VMware HA automatically restarts them on ESXi hosts at site-2. |
Complete site-2 failure
(The failure includes all ESXi hosts and the SAN Volume Controller nodes at site-2). | SAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node. When the node at site-2 is restored, the volume mirror will is restarted and volumes are synchronized. | Virtual machines running on ESXi hosts at the failed site are impacted. VMware HA automatically restarts them on ESXi hosts at site-1. |
Multiple ESXi host failure(s) – Power Off | No impact | VMware HA automatically restarts the virtual machines on available ESXi hosts in the VMware HA cluster. |
Multiple ESXi host failure(s) – Network disconnect | No impact |
VMware HA continues to utilize the datastore heartbeat to exchange cluster heartbeats.
No impact.
|
SAN Volume Controller inter-site link failure, vSphere cluster management network failure | SAN Volume Controller active quorum is utilized to prevent split-brain scenario by coordinating one node to remain active and the other node to go offline. |
ESXi hosts continue to access available volumes through the remaining node.
No impact.
|
Active SAN Volume Controller Quorum Disk Failure | No impact to volume access. A secondary quorum disk is assigned upon failure of active quorum. Volume mirroring is paused until a new active quorum becomes available and then the volume copies are synchronized. |
No impact.
|
vSphere Host Isolation | No impact | HA event dependent upon isolation response rules. Virtual machines can be left on, or rules can dictate for virtual machines to shut down and restart on other hosts in cluster. |
vCenter server failure | No impact | No impact to running virtual machines or VMware HA. VMware DRS functionality is impacted until vCenter access is restored. |