vSAN host fails to enter Maintenance Mode with 'Full data migration' option selected
search cancel

vSAN host fails to enter Maintenance Mode with 'Full data migration' option selected

book

Article ID: 326857

calendar_today

Updated On: 02-27-2025

Products

VMware vSAN

Issue/Introduction

Awareness of vSAN cluster requirements.

Symptoms:
A host will fail to enter maintenance mode using Full Data Evacuation if the cluster is not able to meet the minimum number of fault domains that the storage policy requires which is N+1.



The task using Full Data Evacuation will also hang and/or fail when there is insufficient space to migrate all of the data to other hosts.

Environment

VMware vSAN 8.0.x
VMware vSAN 7.0.x
VMware vSAN 6.x

Cause

vSAN requires a certain number of hosts to be active with disk groups contributing capacity and resources in vSAN in order to provide fault tolerance. If the requirements cannot be met, vSAN will fail the pre-check it performs when placing a host into maintenance mode. 

See the below documentation for further information:
Managing Fault Domains in Virtual SAN Clusters

Fault Domains

When entering Maintenance Mode a "What-if" scenario will be run. If the result is, that after entering Maintenance Mode only not enough Fault Domains will be available the Full Data migration will fail. 

The clomd.log will provide the related error message:
LOM_CheckClusterResourcesForPolicy: Not enough Upper FD's available. Available: 3, needed: 4
LOM_CheckClusterResourcesForPolicy: Not enough Upper FD's available. Available: 4, needed: 5
 
For space related inability to enter with Full Data Evacuation the "What-if" will also indicate the task will fail.

Resolution

For fault domain related failure:
 
If full data migration is needed, then a 5th host for RAID5 or 7th host for RAID6 is required to be added to the vSAN cluster first.
* Otherwise the Storage Policy can be changed to FTT=1,FTM=RAID1

Please be aware that during the change from RAID5 or RAID6 to RAID1, or from Raid 6 to Raid 5, new components will be created first and the old ones deleted once the rebuild has completed. Make sure that there is sufficient space in the cluster before proceeding.

Workaround:
Select Maintenance Mode 'Ensure Accessibility' option - note that all data using RAID5/6 Storage Policies will be in a reduced redundancy state until the host has exited Maintenance Mode and the data resynced back to compliance.

For insufficient space related failure:
Add additional capacity to the cluster to allow data to be rebuilt. This can be via adding additional host(s) or capacity to existing hosts.
 
Workaround:
Select Maintenance Mode 'Ensure Accessibility' option - note that all data with components on the host in maintenance mode will be in a reduced redundancy state until the host has exited Maintenance Mode and the data resynced back to compliance.

Additional Information