Error: "There are currently [X] usable fault domains. The operation requires [Y] more usable fault domains" in vSAN
search cancel

Error: "There are currently [X] usable fault domains. The operation requires [Y] more usable fault domains" in vSAN

book

Article ID: 326856

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms include:

    • Virtual machine (VM) snapshot creation or reverting fails.

    • Storage vMotion or standard vMotion fails during the "Create object" phase.

    • VM provisioning or deployment fails.



Example errors that could be seen when preforming tasks in a vSAN cluster: 

    • There are currently 2 usable fault domains. The operation requires 1 more usable fault domain.

      • Numbers may very based on configurations and storage policy

    • An error occurred while taking a snapshot: Out of resources.

    • Failed to create object. Objects are in reduced-availability-with-no-rebuild 

      • vCenter UI example:

         

      • CLI from host example
        [root@Hostname :~] esxcli vsan debug object health summary get
        Health Status                                              Number Of Objects
        ---------------------------------------------------------  -----------------
        remoteAccessible                                                           0
        inaccessible                                                               0
        reduced-availability-with-no-rebuild                                       2
        reduced-availability-with-no-rebuild-delay-timer                           0

         

    • While creating the Virtual machines it will report Datastore does not match current VM policy. 


Environment

VMware vSAN (All versions)
VMware vSphere ESXi
VMware vCenter Server

Cause

The vSAN cluster does not have enough healthy, active fault domains to satisfy the Failures to Tolerate (FTT) rule defined in the assigned VM Storage Policy. In vSAN, a fault domain is typically an individual ESXi host, an explicitly defined group of hosts, or the vSAN Witness appliance.

Common triggers for this resource deficit include:

  • Disk Group or Drive Failure: A disk group failure (e.g., cache drive failure or multiple capacity drive failures) has rendered the storage components on a host inaccessible, effectively removing that host's contribution to the available fault domains for the affected object.

  • Host Offline or Disconnected: One or more ESXi hosts are disconnected, unresponsive, experiencing a network partition, or have been placed in Maintenance Mode.

  • Witness Unavailability: In a 2-Node or Stretched Cluster, the vSAN Witness Appliance is offline, disconnected, or isolated.

  • Topology Constraints: The physical cluster lacks the total required nodes to support the assigned storage policy (e.g., attempting FTT=2 on a 3-node cluster).

Storage Policy Minimum number of Standalone host /Fault Domain
FTT-1 RAID-1 3
FTT-2 RAID-1 5
FTT-3 RAID-1 7
FTT-1 RAID-5 4
FTT-2 RAID-6 6

 

 

 
 

Resolution

  1. Verify Host Connectivity: Check the connection state and health of all ESXi hosts in the vSAN cluster via the vCenter Server inventory. Ensure no hosts show as disconnected or "Not Responding" and exit Maintenance Mode if applicable.

  2. Check for Decommission State Desynchronization: Connect to any vSAN host via SSH and run the following command to check the CMMDS NODE_DECOM_STATE for all cluster members:

    echo "hostname,decomState,decomJobType";for host in $(cmmds-tool find -t HOSTNAME -f json |grep -B2 Healthy|grep uuid|awk -F \" '{print $4}');do hostName=$(cmmds-tool find -t HOSTNAME -f json -u $host|grep content|awk -F \" '{print $6}');decomInfo=$(cmmds-tool find -t NODE_DECOM_STATE -f json -u $host |grep content|awk '{print $3 $5}'|sed 's/,$//');echo "$hostName,$decomInfo";done|sort
    
    Sample output:
    
    hostname,decomState,decomJobType
    
    esxi-1.example.com,0,0
    esxi-2.example.com,0,0
    esxi-3.example.com,0,0
    
    Anything other than 0 means there is a host in vSAN Decom State.
    
    hostname,decomState,decomJobType
    
    esxi-1.example.com,0,0
    esxi-2.example.com,0,0
    esxi-3.example.com,6,0 <--- host is in Decom State
      

    If you find a host in a Decom State (Anything other then 0), place the host into maintenance mode using the 'No Action' (No data migration) option, and then remove the host from maintenance mode using the ESXi Host UI or vCenter UI to clear this state. See KB vSAN host maintenance mode is in sync with vCenter but not in esxi level (318411).

  3. Inspect Disk Group Health: Navigate to the cluster in vCenter > Configure > vSAN > Disk Management. Verify the operational state of all disk groups. Look for degraded, offline, or unmounted disk groups.
    • Remediate Failed Storage Devices: If a disk and/or disk group has failed, identify the faulty physical drive(s) (cache or capacity tier). Replace the failed hardware, remove the faulty disk and/or disk group, add the new disk to the disk group or recreate the disk group with the new disk if needed, then allow vSAN to rebuild the affected components.
    • For explicit instructions on safely removing a failed disk or disk group prior to physical replacement (accounting for deduplication/compression configurations),see the official VMware documentation and KB:

 

Workaround

Note: The below workaround is only a temporary fix while waiting for failed hardware to be replaced. The advanced setting "/VSAN/ClomForceProvisionPlacements" should not be left permanently enabled as your data won't be properly protected for redundancy, which can lead to data unavailability or even data loss.  

For vSAN 6.7 onwards the following workaround is available:

  1. Set the advanced config option on all Hosts to be able to create/revert snapshots until the Fault Domain issue has been resolved:
    esxcfg-advcfg -s 1 /VSAN/ClomForceProvisionPlacements
  2. Once the Fault Domain issue has been resolved, revert the setting back on all Hosts by:
    esxcfg-advcfg -s 0 /VSAN/ClomForceProvisionPlacements
Please keep in mind that in this case, objects can be created even if they do not satisfy the requirements of a specific storage policy, so it is recommended to change this back after the host is back online again.

Additional Information

Please refer to the following documentation regarding Fault Domains for further information on vSAN cluster sizing.
For safety reasons, in its default configuration vSAN will not allow the provisioning of new objects if there are not enough resources to satisfy the applied storage policy.