This article provides guidelines and vSphere support status for guest deployments using RHEL High Availability Cluster with shared disk resources across nodes in CAB configuration on
VMware vSphere 7. 0 Update 3 and above.
vSphere Version Support for RHEL High Availability Cluster
Table 1 shows supported versions of Red Hat OS and VMware vSphere, being qualified by VMware. VMware doesn't impose any limitation nor require a certification for applications using RHEL High Availability Cluster on a supported Red Hat platform. Therefore any applications running on a supported combination of vSphere and Red Hat OS is supported with no additional considerations.
Table 1Red Hat Version | Minimum vSphere version | Maximum Number of RHEL High Availability Cluster Nodes with Shared Storage Supported by ESXi |
---|
Red Hat 7.9 | vSphere 7.0 Update 3 | 5 |
Red Hat 8.2 or later 8.x | vSphere 7.0 Update 3 | 5 |
Red Hat 9.x | vSphere 7.0 Update 3 | 5 |
See
Support Policies for RHEL High Availability Clusters - fence_scsi and fence_mpath for further details.
*SQL Server 2019 Failover Cluster Instances (FCI) was used to validate the RHEL High Availability Cluster functionality on vSphere and Red Hat versions listed in this table.
Configuration Guidelines
Storage ConfigurationThe vSphere options for presenting shared storage to a VM hosting a node of RHEL High Availability Cluster are shown in Table 2.
Table 2. Supported storage configuration optionsvSphere version | vSphere 7.0 Update 3, vSphere 8.0 |
Shared disk options | RDM in physical compatibility mode |
SCSI bus sharing | Physical |
vSCSI Controller type | VMware Paravirtual (PVSCSI) |
Storage Protocol / Technology | FC, FCoE, iSCSI |
Virtual SCSI Controllers1. Set
scsiX.releaseResvOnPowerOff to value FALSE for each SCSI controller providing shared devices - where
scsiX is the SCSI controller and number (such as scsi1, scsi2, etc).
2. Mixing non-shared and shared disks
Mixing non-shared and shared disks on a single virtual SCSI adapter is not supported. For example, if the system disk is attached to SCSI (0:0), the first clustered disk would be attached to SCSI (1:0). A VM node of a RHEL High Availability Cluster has the same virtual SCSI controller maximum as an ordinary VM - up to four (4) virtual SCSI Controllers.
3. Modify advanced settings for a virtual SCSI controller hosting the boot device.
Add the following advanced settings to the VM nodes:
scsiX.returnNoConnectDuringAPD = "TRUE"
scsiX.returnBusyOnNoConnectStatus = "FALSE"
Where X is the boot device SCSI bus controller ID number. By default, X is set to 0.
4. Virtual disks SCSI IDs should be consistent among all VMs hosting nodes of the same RHEL High Availability Cluster. For example, if the first clustered disk on the first VM node is attached to SCSI (1:0), then it should be attached to the same virtual device node SCSI (1:0) on all the other VM nodes of the cluster.
5. vNVMe controller is not supported for clustered and non-clustered disk (for example, a boot disk must NOT be placed on vNVMe controller). Check
KB 1002149 for more details how to change a controller for the boot disk.
6. Multi-Writer flag must NOT be used.
7. For the best performance consider distributing disks evenly among SCSI controllers and use VMware Paravirtual (PVSCSI) controller.
RHEL High Availability Cluster Fencing AgentSCSI Persistent Reservation fencing (fence_scsi) must be used to ensure that the services that the cluster provides remain available when a node in the cluster encounters a problem. Without the
fence_scsi device configured, we do not have a way to know that the resources previously used by the disconnected
cluster node have been released, and this may prevent the services from running on any of the other cluster nodes. Conversely, the system may assume erroneously
that the cluster node has released its resources, and this can lead to data corruption and data loss. Without such a fencing device configured, data integrity cannot be guaranteed, and the cluster configuration will be unsupported.
Follow the configuration guidelines listed in
https://access.redhat.com/articles/530533 to configure the fence_scsi device.
In addition to the storage fencing, one can also setup and use virtual watchdog timer with vSphere RHEL HA clustering solution.
VMware virtual watchdog ensures that all the VMs in a clustered setup can overcome application crash or Operating System (OS) related faults. It does so by restarting the VM when it detects an application or guest operating system crash when they fail to reset the watchdog timer. After restarting, it also updates the guest operating system, informing it that a crash caused the restart. vWDT helps the operating system or an application to recover from crashes by powering off or resetting the server.
Support for Virtual watchdog timer (vWDT) for RHEL High availability cluster is available from vSphere 8.0 onwards.
Follow the configuration guidelines listed in
How to Add a Virtual Watchdog Timer Device to a Virtual Machine to configure VWT in RHEL High availability cluster.
RHEL High Availability Cluster Parameters Configure the
Filesystem resource created on the shared storage with additional monitoring operation by adding
OCF_CHECK_LEVEL=20 option.
OCF_CHECK_LEVEL=20 causes Pacemaker to perform read/write checks of the filesystem resource. When configuring multiple monitor operations, you must ensure that no two operations are performed at the same interval. Therefore, configure the additional monitoring operation for the Filesystem resource with a different interval than the existing value.
For example, the following command causes Pacemaker to perform the more advanced monitoring check every 61 seconds in addition to the default check.
pcs resource op add <Filesystem Resource ID> monitor interval=61s OCF_CHECK_LEVEL=20Multipathing configuration : Path Selection Policy (PSP) Round Robin PSP is fully supported. Fixed and MRU PSPs can be used as well, but Round Robin PSP might provide better performance utilizing all available paths to the storage array.
Note: While choosing a PSP, consult your storage array vendor for the recommended/supported PSP. For more information, see the Storage/SAN Compatibility Guide.
Perennial Reservation SettingsVMware recommends implementing Perennial Reservation for all ESXi hosts hosting VM nodes with pRDMs. Check the
KB 1016106 or more details
VMware vSphere features support for RHEL High Availability Cluster Configurations with shared disksFollowing VMware vSphere features are supported for RHEL High Availability Cluster:
- VMware vMotion: See the "Support requirements for vMotion of a VM hosting a node of RHEL High Availability Cluster" section below.
- VMware HA : DRS Affinity rules required. When creating a DRS Affinity rule, select Separate Virtual Machines. For more details consult the documentation.
- VMware DRS: See the "Support requirements for vMotion of a VM hosting a node of RHEL High Availability Cluster" section below . In some cases, you might require DRS to treat RHEL High Availability Cluster nodes specially . For example, you might decide that DRS should not migrate VM nodes automatically . You can set VM overrides and have the "Automation Level" for a VM node to a value different than that at the cluster level.
- Online extension of a shared disk resource provided by pRDM.
Support requirements for vMotion of a VM hosting a node of RHEL High Availability ClusterBefore enabling live vMotion, understand Red Hat position on the
official support. VMware supports live vMotion (both user or DRS-initiated) of VM nodes in vSphere 7.0 Update 3 or later with the following requirements:
- The virtual hardware version must be version 11 (vSphere 6.0) and later.
- A RHEL High Availability cluster node can stall for a few seconds during vMotion. If the stall time exceeds the heartbeat time-out interval, then the guest cluster considers the node down and this can lead to unnecessary failover. To allow leeway and make the guest cluster more tolerant, the heartbeat time-out interval needs to be modified to 15s. Follow the link https://access.redhat.com/solutions/221263 to increase the totem token timeout value to 15s
- Must configure cluster nodes with a DRS Affinity rule to prevent hosting more than one node of a RHEL High Availability Cluster on a single ESXi host. This means that you must have N+1 ESXi hosts, where N is the number of RHEL High Availability Cluster nodes.
- The vMotion network must be using a physical network wire with a transmission speed 10GbE (Ten Gigabit Ethernet) and more. vMotion over a 1GbE (One Gigabit Ethernet) network is not supported.
- Shared disks resources must be accessible by the destination ESXi host.
- Supported version of Red Hat used (see Table 1 for the list of supported versions of Red Hat).
LimitationsVMware vSphere features limitationsFollowing VMware vSphere features are NOT supported for RHEL High Availability Cluster:
- Live Storage vMotion support
- Fault Tolerance
- N-Port ID Virtualization (NPIV)
- Mixed versions of ESXi hosts in a vSphere cluster
VM Limitations when hosting a RHEL High Availability cluster node with shared disk on vSphereHot changes to VM hardware might disrupt the heartbeat between the RHEL High Availability Cluster nodes. The following activities are not supported and might cause RHEL High Availability Cluster node failover:
- Hot adding memory
- Hot adding CPU
- Using snapshots
- Cloning a VM
- Suspend and/or resuming the virtual machine state
- Memory over-commitment leading to ESXi swapping or VM memory ballooning
- Sharing disks between virtual machines without a clustering solution may lead to data corruptions
**Disclaimer: VMware is not responsible for the reliability of any data, opinions, advice, or statements made on third-party websites. Inclusion of such links does not imply that VMware endorses, recommends, or accepts any responsibility for the content of such sites.