Implementing vSphere Metro Storage Cluster using IBM System Storage SAN Volume Controller
search cancel

Implementing vSphere Metro Storage Cluster using IBM System Storage SAN Volume Controller

book

Article ID: 343250

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides information on implementing a vSphere Metro Storage Cluster using IBM System Storage SAN Volume Controller.

Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.7
VMware vSphere ESXi 5.1
VMware vSphere ESXi 5.0
VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.0

Resolution

What is vMSC?

vSphere Metro Storage Cluster (vMSC) is a new storage configuration for VMware vSphere environments and recognized with a unique Hardware Compatibility List (HCL) classification. All supported vMSC storage devices available on the VMware Storage Compatibility Guide.

What is IBM SAN Volume Controller?

A stretched IBM SAN Volume Controller configuration in conjunction with VMware vSphere enables transparent VMware vMotion migration and automatic VMware HA failover of virtualized workloads between physical data centers. This image shows a solution overview:
 
 
The IBM System Storage SAN volume controller is an enterprise class storage virtualization system that enables a single point of control for aggregated storage resources. SAN Volume Controller consolidates capacity from different storage systems, both IBM and non-IBM branded, while enabling common copy functions, non-disruptive data movement, and improving performance and availability. IBM SAN Volume Controller combines hardware and software into an integrated, modular solution that forms a highly scalable cluster. An IBM SAN Volume Controller cluster is commonly installed to provide service in a single data center. An IBM SAN Volume Controller cluster can also be installed in a “stretched” configuration, where a single SAN Volume Controller cluster can provide service to two data centers at a maximum distance of 300km. In a stretched configuration IBM SAN Volume Controller enables a highly available stretched volume to be concurrently accessed at both data center locations.
 

What is the IBM SAN Volume Controller Quorum disk?

A SAN Volume Controller cluster quorum disk is a managed disk (MDisk) or managed drive that contains a reserved area that is used exclusively for cluster management. The cluster maintains one active quorum and two quorum candidates (or backup quorums). A cluster uses the quorum disk for two purposes:
  • To break a tie when a SAN fault occurs, when exactly half of the nodes that were previously a member of the cluster are present.
  • To hold a copy of important cluster configuration data. Just over 256 MB is reserved for this purpose on each quorum disk.

A Stretched SAN Volume Controller cluster typically maintains the active quorum disk at a third site to ensure cluster availability is not impacted by a failure of either primary site.

Configuration Requirements

These configuration requirements must be fulfilled to support VMware HA, DRS, and vMotion functions between data centers with a stretched SVC cluster:

  • VMware vCenter server must be able to connect to vSphere hosts at both locations.
  • An IP network with a minimum bandwidth of 622 Mbps for vSphere hosts participating in vMotion.
  • Maximum latency of 5 milliseconds (ms) between hosts participating in vMotion, or 10ms between hosts participating in enhanced vMotion.
  • Source and destination vSphere hosts must have a network interface on the same IP subnet and broadcast domain.
  • The same IP network on which the virtual machines reside must be accessible to vSphere hosts at both data center locations.
  • Datastores on which the virtual machine boot drives reside must be accessible to vSphere hosts at both data center locations.
  • The maximum number of vSphere hosts in a HA enabled cluster must not exceed 32.
  • IBM SAN Volume controller cluster running software code 7.7.0 or above and configured in a supported hardware configuration as outlined in Deployment options for inter-cluster links.
Table1 Metro Cluster Software Components
Metro Cluster Software ComponentsVersion
IBM SVC Stretched Cluster7.7.0 or newer
VMware vSphere6.0 or newer

Deployment options for IBM SAN Volume Controller inter-cluster connections

IBM SAN Volume Controller is categorized as a Uniform Host Access vMSC device. Uniform Host Access (or cross- connect) means that the hosts at each site can access the SAN Volume Controller cluster nodes at the local and remote site through inter-cluster connections. SAN Volume Controller supports two methods for inter-cluster connections:

Node-to-node paths without switch Inter-site links – In this configuration each SAN Volume Controller node is attached directly to the Fibre channel switches in the local and remote sites, and the active quorum disk is attached to Fibre channel switches in both sites. This type of configuration requires the SAN Volume Controller cluster to be running software version 7.7.0 or later, and the maximum support distance is 10km. This image shows this configuration:
 
 
Node-to-node paths with switch Inter-site links – In this configuration each SAN Volume Controller node is attached directly to the Fibre channel switches in the local site, and switch Inter-site links provide connectivity to the remote Fibre channel switches. Inter-site links can also be used to connect the active quorum disk to each site. This type of configuration requires the SAN Volume Controller cluster to be running software version 7.7.0 or newer, and the maximum supported distance is 300km. This image shows this configuration:
 
 

Supported use cases

This list describes the supported use cases of a stretched SAN Volume Controller cluster and VMware vSphere:
  • A stretched SAN Volume Controller cluster presenting an accessible VMware VMFS volume to vSphere hosts at two separate data center locations separated by a distance of up to 300km.
  • Single stretched vSphere cluster leveraging VMware HA and DRS functions with hosts at two separate data center locations separated by a distance of up to 300km.
  • VMware vMotion between vSphere hosts at two separate data center locations separated by a distance of up to 300km.
  • Automatic VMware HA fail over of virtual machines between data centers due to server, storage, or site failure.

Tested Scenarios

This table outlines the tested and supported failure scenarios when using a stretched SAN Volume Controller cluster and VMware vSphere:
 
Failure ScenarioSAN Volume Controller BehaviorVMware HA Impact
Path Failure – SAN Volume Controller Back End (BE) PortSingle path failure between SAN Volume Controller node and storage subsystems. No impact to volume mirroring.No impact
Path Failure – SAN Volume Controller Front End (FE) PortSingle path failure between SAN Volume Controller node and ESXi host. ESXi host uses alternate paths.No impact
BE Array Failure at site-1SAN Volume controller cluster continues to operate off of volume copy at site-2. When the array at site-1 is available the volume mirror will synchronize the volume copies.No impact
BE Array Failure at site-2SAN Volume controller cluster continues to operate off of volume copy at site-1. When the array at site-2 is available the volume mirror will synchronize the volume copies.No impact
SAN Volume Controller Node FailureSAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node.No impact

Complete site-1 failure

(The failure includes all ESXi hosts and the SAN Volume Controller nodes at site-1).
SAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node. When the node at site-1 is restored, the volume mirror will is restarted and volumes are synchronized.Virtual machines running on ESXi hosts at the failed site are impacted. VMware HA automatically restarts them on ESXi hosts at site-2.

Complete site-2 failure

(The failure includes all ESXi hosts and the SAN Volume Controller nodes at site-2).
SAN Volume controller cluster continues to provide access to all volumes through the other SAN Volume Controller node. When the node at site-2 is restored, the volume mirror will is restarted and volumes are synchronized.Virtual machines running on ESXi hosts at the failed site are impacted. VMware HA automatically restarts them on ESXi hosts at site-1.
Multiple ESXi host failure(s) – Power OffNo impactVMware HA automatically restarts the virtual machines on available ESXi hosts in the VMware HA cluster.
Multiple ESXi host failure(s) – Network disconnectNo impact
VMware HA continues to utilize the datastore heartbeat to exchange cluster heartbeats.
 
No impact.
SAN Volume Controller inter-site link failure, vSphere cluster management network failureSAN Volume Controller active quorum is utilized to prevent split-brain scenario by coordinating one node to remain active and the other node to go offline.
ESXi hosts continue to access available volumes through the remaining node.
 
No impact.
Active SAN Volume Controller Quorum Disk FailureNo impact to volume access. A secondary quorum disk is assigned upon failure of active quorum. Volume mirroring is paused until a new active quorum becomes available and then the volume copies are synchronized.
No impact.
vSphere Host IsolationNo impactHA event dependent upon isolation response rules. Virtual machines can be left on, or rules can dictate for virtual machines to shut down and restart on other hosts in cluster.
vCenter server failureNo impactNo impact to running virtual machines or VMware HA. VMware DRS functionality is impacted until vCenter access is restored.
 
 
 


Additional Information