Error: "There are currently [X] usable fault domains. The operation requires [Y] more usable fault domains" in vSAN

Products

VMware vSAN

Issue/Introduction

Virtual machine snapshot creation, reverting, or vMotion operations fail. The following error appears in the vSphere Client: There are currently #### usable fault domains. The operation requires #### more usable fault domain.Symptoms include:

- Virtual machine (VM) snapshot creation or reverting fails.
- Storage vMotion or standard vMotion fails during the "Create object" phase.
- VM provisioning or deployment fails.
- Snapshot creation fails.

Example errors that could be seen when preforming tasks in a vSAN cluster:

There are currently 2 usable fault domains. The operation requires 1 more usable fault domain.
- Numbers may very based on configurations and storage policy
An error occurred while taking a snapshot: Out of resources.

Failed to create object. Objects are in reduced-availability-with-no-rebuild

vCenter UI example:

CLI from host example

[root@Hostname :~] esxcli vsan debug object health summary get
Health Status                                              Number Of Objects
---------------------------------------------------------  -----------------
remoteAccessible                                                           0
inaccessible                                                               0
reduced-availability-with-no-rebuild                                       2
reduced-availability-with-no-rebuild-delay-timer                           0

While creating the Virtual machines it will report Datastore does not match current VM policy.

Environment

VMware vSAN (All versions)
VMware vSphere ESXi
VMware vCenter Server

Cause

The vSAN cluster does not have enough healthy, active fault domains to satisfy the Failures to Tolerate (FTT) rule defined in the assigned VM Storage Policy. In vSAN, a fault domain is typically an individual ESXi host, an explicitly defined group of hosts, or the vSAN Witness appliance.

Common triggers for this resource deficit include:

Disk Group or Drive Failure: A disk group failure (e.g., cache drive failure or multiple capacity drive failures) has rendered the storage components on a host inaccessible, effectively removing that host's contribution to the available fault domains for the affected object.
Host Offline or Disconnected: One or more ESXi hosts are disconnected, unresponsive, experiencing a network partition, or have been placed in Maintenance Mode.
Witness Unavailability: In a 2-Node or Stretched Cluster, the vSAN Witness Appliance is offline, disconnected, or isolated.
Topology Constraints: The physical cluster lacks the total required nodes to support the assigned storage policy (e.g., attempting FTT=2 on a 3-node cluster).

Storage Policy	Minimum number of Standalone host /Fault Domain
FTT-1 RAID-1	3
FTT-2 RAID-1	5
FTT-3 RAID-1	7
FTT-1 RAID-5	4
FTT-2 RAID-6	6

Resolution

Verify Host Connectivity: Check the connection state and health of all ESXi hosts in the vSAN cluster via the vCenter Server inventory. Ensure no hosts show as disconnected or "Not Responding" and exit Maintenance Mode if applicable.
Check for the Host Maintenance mode status- #esxcli vsan cluster get

Check for Decommission State Desynchronization: Connect to any vSAN host via SSH and run the following command to check the CMMDS NODE_DECOM_STATE for all cluster members:

echo "hostname,decomState,decomJobType";for host in $(cmmds-tool find -t HOSTNAME -f json |grep -B2 Healthy|grep uuid|awk -F \" '{print $4}');do hostName=$(cmmds-tool find -t HOSTNAME -f json -u $host|grep content|awk -F \" '{print $6}');decomInfo=$(cmmds-tool find -t NODE_DECOM_STATE -f json -u $host |grep content|awk '{print $3 $5}'|sed 's/,$//');echo "$hostName,$decomInfo";done|sort

Sample output:

hostname,decomState,decomJobType

esxi-1.example.com,0,0
esxi-2.example.com,0,0
esxi-3.example.com,0,0

Anything other than 0 means there is a host in vSAN Decom State.

hostname,decomState,decomJobType

esxi-1.example.com,0,0
esxi-2.example.com,0,0
esxi-3.example.com,6,0 <--- host is in Decom State

If you find a host in a Decom State (Anything other then 0), place the host into maintenance mode using the 'No Action' (No data migration) option, and then remove the host from maintenance mode using the ESXi Host UI or vCenter UI to clear this state. See KB vSAN host maintenance mode is in sync with vCenter but not in esxi level (318411).

Inspect Disk Group Health: Navigate to the cluster in vCenter > Configure > vSAN > Disk Management. Verify the operational state of all disk groups. Look for degraded, offline, or unmounted disk groups.
- Remediate Failed Storage Devices: If a disk and/or disk group has failed, identify the faulty physical drive(s) (cache or capacity tier). Replace the failed hardware, remove the faulty disk and/or disk group, add the new disk to the disk group or recreate the disk group with the new disk if needed, then allow vSAN to rebuild the affected components.
- For explicit instructions on safely removing a failed disk or disk group prior to physical replacement (accounting for deduplication/compression configurations),see the official VMware documentation and KB:
Verify Witness Connectivity: Witness host must be restored to an operational state and rejoined to the cluster if partitioned
- Refer KB article to troubleshoot witness connectivity in stretched cluster
  Troubleshooting vSAN Witness appliance partitioned from the stretched cluster
- Temporary Workaround: Deploy a new Witness appliance and integrate it into the cluster by replacing the existing (failed/missing) Witness node.
  1.Download the vSAN Witness Appliance OVF that matches your current environment's version and build number from the customer portal. (Note: The vSAN witness appliance includes an embedded license.)
  2.Deploy the downloaded OVF
  3.Verify that forward and reverse DNS entries are configured and ensure there are no IP address conflicts on newly deployed witness
  4.Network reachability to the newly deployed vSAN witness must be confirmed.
  
  5.The appliance is to be added to the vCenter inventory as a standalone host.
  6.In the vSphere Client, navigate to your target cluster and select Configure > vSAN > Fault Domains & Stretched Cluster. From the top-right corner, click the option to Configure Stretched Cluster ( Replace Witness, if you are swapping out an existing node).

Workaround

Note: The below workaround is only a temporary fix while waiting for failed hardware to be replaced. The advanced setting "/VSAN/ClomForceProvisionPlacements" should not be left permanently enabled as your data won't be properly protected for redundancy, which can lead to data unavailability or even data loss.

For vSAN 6.7 onwards the following workaround is available:

Set the advanced config option on all Hosts to be able to create/revert snapshots until the Fault Domain issue has been resolved:
esxcfg-advcfg -s 1 /VSAN/ClomForceProvisionPlacements
Once the Fault Domain issue has been resolved, revert the setting back on all Hosts by:
esxcfg-advcfg -s 0 /VSAN/ClomForceProvisionPlacements

Keep in mind that in this case, objects can be created even if they do not satisfy the requirements of a specific storage policy, so it is recommended to change this back after the host is back online again.

Additional Information

Please refer to the following documentation regarding Fault Domains for further information on vSAN cluster sizing.

For safety reasons, in its default configuration vSAN will not allow the provisioning of new objects if there are not enough resources to satisfy the applied storage policy.

To speak with a customer representative or a Support Engineer see Contact Support. Scroll to the bottom of the page and click on your respective region.
Review Fault Domain Design and Sizing.