Configuration
HyperSwap solution componentsA minimal HyperSwap solution consists of:
- One system which consists of at least two I/O groups. Each I/O group is at a different site. Both nodes of an I/O group are at the same site.
- HyperSwap-protected hosts, each connected to storage nodes via iSCSI or Fibre Channel or FCoE
- Except for 2 sites which are defined as failure domain, a third site is needed to house a quorum disk or IP quorum application
NodeA node is an individual server upon which the IBM SAN Volume Controller cluster runs. The nodes are always installed in pairs. Each pair of nodes is known as an I/O group. I/O operations between hosts and system nodes and between the nodes and arrays use the SCSI standard. The nodes communicate with each other through private SCSI commands.
I/O groupA pair of nodes is known as an input/output (I/O) group.
Volumes are logical disks that are presented to the SAN by nodes. Volumes are also associated with the I/O group. When an application server processes I/O to a volume, it can access the volume with either of the nodes in the I/O group. When you create a volume, you can specify a preferred node. The other node in the I/O group is used only if the preferred node is not accessible. If you do not specify a preferred node for a volume, the system selects the node in the I/O group that has the fewest volumes to be the preferred node.
The HyperSwap topology locates both nodes of an I/O Group in the same site. You should assign the nodes of at least one I/O group to each of sites 1 and 2. Therefore, to get a volume that is resiliently stored on both sites, at least two I/O Groups (4 nodes) are required. For larger systems using the HyperSwap system topology, if most volumes are configured as HyperSwap volumes, it’s preferable to have 2 full I/O groups on each site, instead of 1 I/O group on one site and 2 on the other. This is to avoid the site with only 1 I/O group becoming a bottleneck.
HyperSwap Volume and consistent groupHyperSwap volumes create copies on separate sites for systems that are configured with HyperSwap topology. Data that is written to a HyperSwap volume is automatically sent to both copies so that either site can provide access to the volume if the other site becomes unavailable. HyperSwap volumes are supported on SAN Volume Controller cluster that contain more than one I/O group.
HyperSwap is a system topology that enables disaster recovery and high availability between I/O groups at different locations. Before you configure HyperSwap volumes, the system topology needs to be configured for HyperSwap and sites must be defined.
In addition, the management GUI creates an active-active relationship and change volumes automatically. Active-active relationships manage the synchronous replication of data between HyperSwap volume copies at the two sites. You can specify a consistency group that contains multiple active-active relationships to simplify management of replication and provide consistency across multiple volumes. A consistency group is commonly used when an application spans multiple volumes. Change volumes maintain a consistent copy of data during resynchronization. Change volumes allow an older copy to be used for disaster recovery when a failure occurred on the up-to-date copy before resynchronization completes.
QuorumA SAN Volume Controller cluster quorum disk is a managed disk (MDisk) or managed drive that contains a reserved area that is used exclusively for cluster management. The cluster maintains one active quorum and two quorum candidates (or backup quorums). A cluster uses the quorum disk for two purposes:
- To break a tie when a SAN fault occurs, when exactly half of the nodes that were previously a member of the cluster are present.
- To hold a copy of important cluster configuration data. Just over 256 MB is reserved for this purpose on each quorum disk.
A HyperSwap SAN Volume Controller cluster typically maintains the active quorum disk at a third site to ensure cluster availability is not impacted by a failure of either primary site.
To use a quorum disk as the quorum device, this third site must use Fibre Channel connectivity together with an external storage system. Sometimes, Fibre Channel connectivity is not possible. Starting with Spectrum Virtualize software version 7.6 it is possible to use an IP-based quorum application as the quorum device for the third site, no Fibre Channel connectivity is used. Java applications are run on hosts at the third site. Even with IP quorum applications at the third site, quorum disks at site one and site two are required, because they are used to store metadata.
SiteThe ‘Site’ corresponds to a physical location that houses the physical objects of the system.
Site is also referred to as the failure domain.
In a SAN Volume Controller HyperSwap configuration, the term site is used to identify components of the SAN Volume Controller that are contained within a boundary so that any failure that occurs (such as a power failure, fire, or flood) is contained within that boundary. The failure therefore cannot propagate or affect components that are outside of that boundary. The components that make up a SAN Volume Controller HyperSwap configuration must span three independent sites. Two sites contain SAN Volume Controller I/O Groups and if you are virtualizing external storage, the storage controllers that contain customer data. The third site contains a storage controller where the active quorum disk is located.
Configuration Requirements
- Directly connect each node to two or more SAN fabrics at the primary and secondary sites (2 - 8 fabrics are supported). Sites are defined as independent failure domains
- Use a third site to house a quorum disk or IP quorum application. Quorum disks cannot be located on iSCSI-attached storage systems; therefore, iSCSI storage cannot be configured on a third site.
- If a storage system is used at the third site, it must support extended quorum disks. More information is available in the interoperability matrixes that are available at the following website:
- Place independent storage systems at the primary and secondary sites, and use active-active relationships to mirror the host data between the two sites.
- Connections can vary based on fibre type and small form-factor pluggable (SFP) transceiver (longwave and shortwave).
- Nodes that have connections to switches that are longer than 100 meters (109 yards) must use longwave Fibre Channel connections. A longwave small form-factor pluggable (SFP) transceiver can be purchased as an optional component, and must be one of the longwave SFP transceivers that are listed at the following website:
- Avoid using inter-switch links (ISLs) in paths between nodes and external storage systems. If this configuration is unavoidable, do not oversubscribe the ISLs because of substantial Fibre Channel traffic across the ISLs. For most configurations, trunking is required. Because ISL problems are difficult to diagnose, switch-port error statistics must be collected and regularly monitored to detect failures.
- Using a single switch at the third site can lead to the creation of a single fabric rather than two independent and redundant fabrics. A single fabric is an unsupported configuration.
- Ethernet port 1 on every node must be connected to the same subnet or subnets. Ethernet port 2 (if used) of every node must be connected to the same subnet (this might be a different subnet from port 1). The same principle applies to other Ethernet ports.
- Some service actions require physical access to all nodes in a system. If nodes in a HyperSwap system are separated by more than 100 meters, service actions might require multiple service personnel. Contact your service representative to inquire about multiple site support.
- Use consistency groups to manage the volumes that belong to an application. This structure ensures that when a rolling disaster occurs, the out-of-date image is consistent and therefore usable for that application.
Table1 Metro Cluster Software Components
Metro Cluster Software Components | Version |
IBM SVC/Storwize Hyperswap | 7.7.0 or newer |
VMware vSphere | 6.0 or newer |
VMware environment
For the most up-to-date information on the recommended version of ESXi, refer to the SAN Volume Controller release notes.
- ESXi hosts should use NMP (Native MultiPathing) and the PSP Round Robin with SAN Volume Controller. This is the default for SAN Volume Controller, no action is required.
- For management and vMotion traffic, the ESXi hosts at both data centers must have a private network on the same IP subnet and broadcast domain.
Preferably, management and vMotion traffic should be on separate networks.
- The ESXi hosts at both data centers must have a private network on the same IP subnet and broadcast domain.
- The VMware vCenter must be accessible from all ESXi hosts at both data centers.
- The virtual machines IP network must be accessible to the ESXi hosts at both data centers. This assures transparency of any VMware HA event triggered to any virtual machine running on any ESXi host.
- All datastores used by the ESXi hosts and virtual machines must be accessible from ESXi hosts at both data centers.
- The datastore used by the ESXi hosts and virtual machines must be provisioned on HyperSwap volumes.
Topologies
Figure 1 HyperSwap configuration The HyperSwap function provides highly available volumes accessible through two sites at up to 300km apart. A fully-independent copy of the data is maintained at each site. When data is written by hosts at either site, both copies are synchronously updated before the write operation is completed. The HyperSwap function will automatically optimize itself to minimize data transmitted between sites and to minimize host read and write latency.
If the nodes or storage at either site go offline, leaving an online and accessible up-to-date copy, the HyperSwap function will automatically fail over access to the online copy. The HyperSwap function also automatically resynchronizes the two copies when possible.
The HyperSwap function in the SAN Volume Controller software works with the standard multipathing drivers that are available on a wide variety of host types, with no additional host support required to access the highly available volume. Where multipathing drivers support ALUA, the storage system will tell the multipathing driver which nodes are closest to it, and should be used to minimize I/O latency. You just need to tell the storage system which site a host is connected to, and it will configure host pathing optimally.
In Figure 1, if the IO group where the primary vdisk(V1_P) is gone then paths from second IO group take over and SAN Volume Controller internally manages to use the secondary vdisk(V1_S) to process data.
- If you lose access to IO Group1 from the host then the host multi-pathing will automatically access the data via IO Group2
- If you only lose primary copy of data then HyperSwap function will forward request to IO Group2 to service I/O
- If you lose IO Group 1 entirely then the host multi-pathing will automatically access the other copy of the data on IO Group 2
User scenarios in a HyperSwap configuration
# | Scenario | SVC behavior | VMware vSphere behavior |
1 | Using VMware vMotion or VMware Distributed Resource Scheduler (DRS) to migrate virtual machines between Site 1 and Site 2 | According to the IO throughput, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency. | I/O continues with the SVC site at Site 1. |
2 | Failure of all ESXi hosts in Site 1 – Power Off | According to the IO throughput, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency. |
- VMware HA automatically restarts the virtual machines on the available ESXi hosts in Site 2.
- There is no downtime if Fault Tolerance is configured
on the virtual machines.
|
3 | Host partial path failure (some paths are still alive) | No impact | No impact on virtual machines. ESXi I/O is redirected to any available active path via PSP (ALUA). |
4 | Failure of all preferred paths on the host (from SVC site at Site 1); only nonpreferred paths are alive (from SVC site at Site 2) | According to the IO throughput, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency. | ESXi I/O is redirected to nonpreferred paths via PSP (ALUA). No impact on virtual machines. |
5 | Failure of all paths on the host (APD) - no paths are alive | HA/DRS balances load on the surviving hosts in the cluster. According to the IO throughput, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency. | Two options to recover the virtual machines:
- ESXi hosts must be shut down manually for VMware
High Availability to restart virtual machines on the other hosts - Enable the VMCP capability under the HA settings to handle datastore APD situation to restart virtual machines on the other hosts
|
6 | SVC site at Site 1 fails | HyperSwap failover
- Secondary HyperSwap volumes/consistency groups on
Site 2 become Primary in HyperSwap relations - Host I/O is redirected to Site 2
- If SVC site at Site 1 has recovered from a failure and has been receiving a significant amount of writes for a while, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency
|
- Active paths to SVC site at Site 1 are reported unavailable
- Active paths to SVC site at Site 2 become preferred
- No disruption to virtual machines and/or ESXi I/O
|
7 | Site 1 failure - both ESXi hosts and SVC | HyperSwap failover
- Secondary HyperSwap volumes/consistency groups on
Data Center B become Primary in HyperSwap relations - If Site 1 has recovered from a failure and has been receiving a significant amount of writes for a while, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency
|
- VMware High Availability restarts failed virtual machines
on the available ESXi hosts at Site 2 - There is no downtime if Fault Tolerance is configured on the
failed virtual machines.
|
8 | The SVC site that owns the Primary volume lost connectivity with the Quorum and with the SVC site that owns the Secondary volume | HyperSwap failover
- Secondary HyperSwap volumes/ consistency groups on Site 2 become Primary in their respective HyperSwap relations.
- Host I/O is redirected to SVC site at Site 2
|
- Active paths to HyperSwap volumes on Site 1 are reported unavailable
- Active paths to HyperSwap volume on Site 2 become preferred
- No disruption to virtual machines and/or ESXi I/O
|
9 | Site 2 failure -both ESXi hosts and SVC | HyperSwap failover
- Primary HyperSwap volumes/consistency groups on
Site 1 are not affected - If Site2 has recovered from a failure and has been receiving a significant amount of writes for a while, the HyperSwap function will automatically switch the direction of relationship to maximize the IO efficiency
| No disruption to virtual machines running on Site 1 |
10 | HyperSwap link failure |
- Synchronization between Primary and Secondary HyperSwap volumes/ consistency groups is broken
- Secondary HyperSwap volumes/ consistency groups stop serving host I/Os
- Primary HyperSwap volumes/ consistency groups continue
serving I/Os
|
- No disruption to virtual machines and /or ESXi I/O
- Paths to Primary volumes/CGs remain Active/Preferred
- Paths to Secondary volumes/CGs become unavailable
|
11 | Quorum failure | Mirroring between Primary and Secondary volumes continues and both Primary and Secondary keep serving host I/Os |
- No disruption to virtual machines
- An additional failure at this point will not trigger automatic
failover and can result in loss of access
|
Additional referencesFor Information about the use of IBM SVC HyperSwap with VMware Site Recovery Manager refer to:
For more information, see
VMware vSphere Metro Storage Cluster Recommended Practices.