Microsoft Windows Server Failover Clustering (WSFC) with shared disks on VMware vSphere 8.x: Guidelines for supported configurations
search cancel

Microsoft Windows Server Failover Clustering (WSFC) with shared disks on VMware vSphere 8.x: Guidelines for supported configurations

book

Article ID: 313472

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware vSphere 8.x provides flexibility and choice in architecting high-availability solutions using Windows Server as guest Operating Systems (OS). VMware vSphere 8.x supports Windows Server Failover Cluster (WSFC) with shared (clustered) disk resources by transparently passing to the underlying storage or emulating on the datastore level SCSI-3 Persistent Reservations (SCSI3-PRs) SCSI commands, required for a WSFC node (VM participating in a WSFC, further references as VM node) to arbitrate access to a shared disk. It is a general best practice to ensure that each node participating in a WSFC has the same configuration.

Information in this article is applicable to configurations when VMs hosting nodes of a WSFC (VM nodes) are located on different ESXi hosts – known as “Cluster-across-box (CAB)”. CAB provides High Availability (HA) both from the in-guest and vSphere environment perspective. VMware does not recommend configurations, where all VM nodes are placed on a single ESXi host (so called “cluster-in-a-box”, or CIB). The CIB solution should not be used for any production implementations – if a single ESXi host will fail, all cluster nodes will be powered off and, as the result, your application will experience downtime.
This KB assumes that all components underneath the WSFC-based VMs are architected to provide the proper availability and conform to Microsoft’s supportability as it relates to the in-guest configuration of this article. 

Note: A single WSFC consisting of both physical nodes and virtual machine is supported. For more information, see Cluster Physical and Virtual Machines section in the Setup for Failover Clustering and Microsoft Cluster Service Guide.


Environment

VMware vSphere ESXi 8.0.x
 

Resolution

This article provides guidelines and vSphere support status for guest deployments using Microsoft Windows Server Failover Clusters (WSFCs) with shared disk resources across nodes in CAB configuration on VMware vSphere 8.x.

 

vSphere Version Support for WSFC

Table 1 shows supported versions of Windows OS and VMware vSphere, being qualified by VMware. VMware nor impose any limitations nor require a certification for applications using WSFC on a supported Windows platform. Therefore any application (including Microsoft SQL Server) running on a supported combination of vSphere and Windows OS is supported with no additional considerations.
 
Note: Other WSFC-based solutions, not accessing shared disks (Microsoft SQL Server Always On Availability Groups (AGs) or Microsoft Exchange Database Availability Groups (DAG) require no special storage configurations on the vSphere side (VMFS or NFS). This kb should not be used for such configurations.

Table 1. Versions of Windows Server Supported by vSphere for a WSFC

Windows Server Version1 Maximum Number of WSFC Nodes with Shared Storage Supported by ESXi
2022 5
2019 5
2016 5
2012 / 2012 R2 5

 

  1. SQL Server 2016, 2017 and 2019 Failover Cluster Instances (FCI) were used to validate the WSFC functionality on vSphere and Windows Server versions listed in this table. 
  2. If the cluster validation wizard completes with the warning: ”Validate Storage Spaces Persistent Reservation,” you can safely ignore the warning. This check applies to the Microsoft Storage Spaces feature, which does not apply to VMware vSphere.
  3. ESXi host now supports up to Sixteen (16) WSFC clusters (i.e., multi-cluster) running on the same ESXi host.

 

VMware vSphere features support for WSFC Configurations with shared disks

The following VMware vSphere features are supported for WSFC:

  • VMware HA. DRS Affinity rules required. When creating a DRS Affinity rule, select Separate Virtual Machines. For more details consult the documentation.
  • VMware DRS. See the Support requirements for vMotion of a VM nodes below.
  • Offline (cold) Storage vMotion of a VM node is supported when the target host is on vSphere 8.x, 7.x, or vSAN 6.7 Update 3 or VMware Cloud on AWS. Please follow the process listed in Migrating VMs with shared or multi-writer disks for VMs with shared disks.

The following VMware vSphere features are NOT supported for WSFC:

  • Live Storage vMotion.
  • Fault Tolerance (FT).
  • N-Port ID Virtualization (NPIV).
  • Mixed versions of ESXi hosts in a vSphere cluster in production use.

 

Support requirements for vMotion of a VM hosting a node of WSFC

Live vMotion (both user- or DRS-initiated) of VM nodes is supported in vSphere 8x under the following requirements: 

  • The VM virtual hardware version (“VM compatibility”) must be version 11 (vSphere 6.0) or later.
  • The value of the SameSubnetThreshold Parameter of Windows cluster health monitoring must be modified to allow 10 missed heartbeats at minimum. This is the default in Windows Server 2016. This recommendation applies to all applications using WSFC, including shared and non-shared disks.
  • Must configure cluster nodes with a DRS Affinity rule to prevent hosting more than one node of a WSFC cluster on a single ESXi host. This means that you must have N+1 ESXi hosts, where N is the number of WSFC nodes.
  • The vMotion network must be using a physical network wire with a transmission speed 10GE (Ten Gigabit Ethernet) and more. vMotion over a 1GE (One Gigabit Ethernet) is not supported.
  • Shared disks resources must be accessible by the destination ESXi host.
  • Supported version of Windows Server used (see Table 1 for the list of supported versions of Windows Server).
  • It is not recommended to migrate more than 8 wsfc virtual machines at the same time ( in case of each virtual machine holding cluster shared resources ) for optimal performance and not to lead to fail over of cluster roles to other virtual machines.

 

Storage Configuration

vSphere options for presenting a shared storage to a VM node are shown in Table 2. 
Table 2. Supported storage configuration options

vSphere version Shared disk options SCSI bus sharing vSCSI Controller type Storage Protocol / Technology
vSphere 8.0 Clustered VMDKs physical NVMe Controller (From vSphere version 8.0 update 2 on-wards and Guest OS version Windows server 2022), VMware Paravirtual (PVSCSI), LSI Logic SAS FC, NVMeOF ( NVMe over FC protocol)
vSphere 8.0 VMware vSphere® Virtual Volumes (vVols), RDM physical mode physical VMware Paravirtual (PVSCSI), LSI Logic SAS FC, FCoE, iSCSI
VMware Cloud on AWS Clustered VMDKs physical VMware Paravirtual (PVSCSI), LSI Logic SAS N/A / vSAN
vSAN (vSphere 8.0) Clustered VMDKs physical VMware Paravirtual (PVSCSI), LSI Logic SAS N/A / vSAN

 

 

Clustered VMDKs

Requirements:
VMware ESXi, VMware vCenter®, VMware vSphere VMFS

  • Require all hosts connected to a clustered VMDK datastore to be on ESXi version 7.x, 8.0 or higher, managed by the same vCenter instance, version 7.x, 8.0 and higher while disabling or enabling clustered VMDK flag on the datastore. Once the clustered VMDK flag is enabled/disabled, hosts can be managed by any vCenter with version 7.x, 8.0 or later.
    Note: all ESXi hosts and vCenter Server must use the same major version of vSphere.
  • All ESXi hosts involved in hosting nodes of a WSFC must be managed by the same vCenter instance. A cross vCenter WSFC is not supported (i.e., when ESXi hosts, hosting VM nodes of a WSFC are managed by different vCenter instances)
  • Requires VMFS version 6. 

Virtual Machine (VM)

  • VMDKs must be Eager Zeroed Thick (EZT) Provisioned.
  • Clustered VMDKs must be attached to a virtual SCSI controller with bus sharing set to physical. VM Boot disk (and all VM non-shared disks) should be attached to a separate virtual SCSI controller with bus sharing set to none. Mixing clustered and non-shared disks on a single virtual SCSI controller is not supported.
  • Multi-writer flag must NOT be used.

Datastore

  • Require the datastore capability Clustered VMDK support to be enabled.

Storage Array/LUN

  • Support SCSI-3 reservation type WEAR (i.e., write exclusive all registrant).
  • Supported storage system access protocol – Fibre Channel (FC), NVMeOF ( NVMe over FC protocol).
  • The physical disk on which VMDKs are stored should support ATS commands.
  • Require 512/512e sector size disks.
  • Storage devices can be claimed by VMware Native Multipathing Plugin ( NMP ), High performance plugin (HPP) or any other third party (non-vmware) MultiPathing Plugins (MPPs). But please check with vendor regarding the support for Clustered VMDK before using their plugin (MPP)

Supportability

  • Maximum number of 192 clustered VMDKs per ESXi host.

  • Mixing of clustered VMDKs and other types of clustered disks (e.g., pRDMs, vVol) in a single VM is not supported.

  • Placing all VMs, nodes of a WSFC on the same ESXi host (i.e. Cluster-in-a-Box (CiB) is not supported. 

  • VMs, nodes of a WSFC, must be placed on different ESXi hosts (i.e. Cluster Across Boxes (CAB)). The placement must be enforced with DRS MUST Anti-Affinity Rules. 

  • Change/increase in the GOS the WSFC Parameters.

    • (get-cluster -name <cluster-name>).QuorumArbitrationTimeMax = 60

    • (get-cluster -name <cluster-name>).SameSubnetThreshold = 10

    • (get-cluster -name <cluster-name>).CrossSubnetThreshold = 20

    • (get-cluster -name <cluster-name>).RouteHistoryLength = 40

  • Only datastores accessible via FC/NVMe Over FC are currently supported.

  • SCSI-2 Reservations will not be supported on clustered disks/datastores.

 

 

RDM Configuration


RDMs used as clustered disk resources must be added using physical compatibility mode.

  1. Mixing non-shared and shared disks on a single virtual SCSI adapter is not supported. For example, if the system disk (drive C:) is attached to SCSI0:0, the first clustered disk would be attached to SCSI1:0. A VM node of a WSFC has the same virtual SCSIcontroller maximum as an ordinary VM - up to four (4) virtual SCSI Controllers.                                                                                                                                                                                   
  2. Modify advanced settings for a virtual SCSI controller hosting the boot device.

           Add the following advanced settings to the VMs node: 

  • scsiX.returnNoConnectDuringAPD = "TRUE" 

  • scsiX.returnBusyOnNoConnectStatus = "FALSE" 
    Where X is the boot device SCSI bus controller ID number. By default, X is set to 0.

  1. Virtual disks SCSI IDs should be consistent between all VMs hosting nodes of the same WSFC. 

  2. vNVMe controller is not supported for clustered and non-clustered disk (for example, a boot disk must NOT be placed on vNVMe controller). Check this KB 1002149  for more details on how to change a controller for the boot disk: 

    From vSphere version 8.0 update 2 on-wards and Guest OS version Windows server 2022 has been add as supported Controller. 

    For using vNVMe controller in windows failover clustering config, following are the requirements.
    1. vSphere version should be 8.0u2 or later.
    2. Guest OS version should be Windows server 2022 ( we tested with build 20348)
    3. VM hardware version should be minimum 21.
    4. Bus sharing mode should be set to Physical
    5. Need to add following entries in vmx file for all the vms 
       nvme.specVersion="103" 
       nvme0.returnNoConnectDuringAPD = "TRUE" ( For the nvme controller where boot disk of the VM is attached )
       nvme0.returnBusyOnNoConnectStatus = "FALSE" ( For the nvme controller where boot disk of the VM is attached )

  3. Multi-writer flag must NOT be used.

  4. Use VMware Paravirtual virtual SCSI controller for the best performance.

For the best performance consider distributing disks evenly between as much SCSI controllers as possible and use NVMe controller or VMware Paravirtual (PVSCSI) controller  (provides better performance with lower CPU usage and is a preferred way to attach clustered disk resources).

Storage protocols

  • FC, FCoE and Native iSCSI are fully supported with pRDMs and vVols.
  • Clustered VMDKs are supported with FC/NVMe over FC only.

NFS is not a supported storage protocol to access a clustered disk resource for WSFC. NFS backend VMDKs can be used as non-shared disks (system disk, backup, etc.) without limitations.

  • vSAN supports natively clustered VMDK in vSphere 8.x
  • Virtual Volumes – supported. Check with your storage vendor if the implementation of vVol includes support for WSFC on VMware vSphere.

Multipathing configuration : Path Selection Policy (PSP)

RDM physical mode, vVols, Clustered VMDKs: Round Robin PSP is fully supported. Fixed and MRU PSPs can be used as well, but Round Robbin PSP might provide better performance utilizing all available paths to the storage array.

 Note:

  • While choosing a PSP, consult your storage array vendor for the recommended/supportedPSP. For more information, see the Storage/SAN Compatibility Guide.
  • If the number of paths to a storage device exceeds 5, WSFC cluster storage validation may fail if Round Robin path policy ( PSP_RR) with iops =1 is used. In such a case, we recommend setting the number of iops to 5 or more when Round Robin path policy is used. This is not a VMware issue.

Perennially reservations

VMware recommends implementing perennial reservation for all ESXi hosts hosting VM nodes with pRDMs and Clustered VMDKs. Check the KB 1016106 for more details.

In-guest shared storage configuration options

Maintaining in guest options for storage (such as iSCSI or SMB shares) is up to those implementing the solution and is not visible to ESXi.

VMware fully supports a configuration of WSFC using in-guest iSCSI initiators or in-guest SMB (Server Message Block) protocol, provided that all other configuration meets the documented and supported WSFC configuration. Using this configuration in VMware virtual machines is similar to using it in physical environments. 

Note: vMotion has not been tested by VMware with any in-guest shared storage configurations.

VM Limitations when hosting a WSFC with shared disk on vSphere. 

Hot changes to virtual machine hardware might disrupts the heartbeat between the WSFC nodes These activities are not supported and shall cause WSFC node failover: 

  • Hot adding memory.
  • Hot adding CPU.
  • Hot adding storage controllers like lsisas, pvscsi or nvme.
  • Hot adding network adapters.
  • Any other hardware changes while cluster VMs are in powered on state  except hot add disk or hot share disk.
  • Using snapshots.
  • Cloning a VM
  • Pausing and/or resuming the virtual machine state.
  • Memory over-commitment leading to ESXi swapping or VM memory ballooning.
  • Sharing disks between virtual machines without a clustering solution may lead to data corruptions.
  • Online Extension of a clustered VMDK is not supported. The kb will be updated when the support for this operation will be provided.
  • From vSphere version 8.0 update 2, Online extension of shared vVol disk is supported.
  • vSCSI filters and I/O filters are not supported with clustered VMDK OR passthrough RDM.

A shared disk resource provided by pRDM can be extended online or offline.

Scsi controller bus sharing should be set to "physical" for SRDF (Symmetrix Remote Data Facility) kind of configurations.

 

Microsoft support policies for a virtualized deployment of WSFC

Microsoft supports deployment of WSFC on a VM. Check the Microsoft SVVP Program for more details on Windows Server Virtualization Validation Program. Also, check the Microsoft KB article Support policy for Microsoft SQL Server products that are running in a hardware virtualization environment.

 

Additional Information


Disclaimer: VMware is not responsible for the reliability of any data, opinions, advice, or statements made on third-party websites. The inclusion of such links does not imply that VMware endorses, recommends, or accepts any responsibility for the content of such sites.

---------

How to perform SvMotion for shared WSFC VMs:

1. We have to power off both VMs.

2. Remove all shared disks from the secondary VM.

3. Migrate both the VMs to the destination datastore.

4. Add the shared disks to the secondary VM as existing disks.

5. Power on the both VMs. 

Note: If we don't follow the above instructions and migrate both VMs to a different datastore then both VMs will be separated and the datastore usages will double in size.