vSAN Health Service - Cluster health - vSAN optimal datastore default policy configuration
search cancel

vSAN Health Service - Cluster health - vSAN optimal datastore default policy configuration

book

Article ID: 314301

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article explains the Cluster health - vSAN optimal datastore default policy configuration check in the vSAN Health Service and provides details on why it might report the error and how to fix the warning/error state.



Environment

VMware vSAN 8.0 U1 and higher

Resolution

Q: What does the Cluster Health – vSAN optimal datastore default policy configuration check do?

This health test will check if the cluster's current datastore default policy is optimal or not.  The optimal policy for different cluster types and sizes can be referred to in the below table.
Note: EMM = Enter Maintenance Mode, HFTT = Host Failures to Tolerate, SFTT = Site Failures to Tolerate

Type

Number of Nodes

Recommended FTT

Details Host EMM and Remove Operation Impact
  With node reservation    
Standard cluster 3

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=None - standard cluster

N/A

Use existing Default vSAN policy

 

Keep the current behavior 

4

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=None - standard cluster

Create new RAID-5 policy 

 

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac. 
5

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-5 policy

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-6 policy.

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac.
7 and more

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-6 policy.

For 7 nodes: 

  • User can put two hosts in EMM using EnsureAcc. 
  • Can remove 1-node from cluster with full data evac.
Stretched cluster If nodes on each side <=2

HFTT=No data redundancy

SFTT=Site mirroring - stretched cluster

(To tolerate n failure, needs 2n+1 hosts in each cluster site)

N/A

Create new vSAN ESA stretched cluster policy
Existing behavior. 
If nodes on each side ==3

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=Site mirroring - stretched cluster

N/A Create new vSAN ESA stretched cluster policy Existing behavior. 
If nodes on each side >=4 and <= 5

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=Site mirroring - stretched cluster

N/A Create new vSAN ESA stretched cluster policy RAID-5 policy
  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac
If nodes on each side >= 6

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=Site mirroring - stretched cluster

N/A Create new vSAN ESA stretched cluster R-6 policy

For 6 Nodes: 

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac.

For 7 nodes: 

  • User can put two hosts in EMM using EnsureAcc. 
  • Can remove 1-node from cluster with full data evac.
2-node Stretch 2, Fixed configuration

HFTT=No data redundancy

SFTT=Site mirroring - stretched cluster

N/A Use existing Default vSAN policy Existing behavior. 

Note: If using Host mirroring - 2 node cluster, SFTT = 1 and HFTT = 1 and requires a minimum of 3 disk groups per data host or 3 disks in a storage pool

Note: vCenter equivalent options for Standard Clusters

HFTT = 0 - FTT = No data redundancy, No data redundancy with host affinity
HFTT = 1 - FTT = 1 failure - RAID-1 (Mirroring), 1 failure - RAID -5 (Erasure Coding)
HFTT = 2 - FTT = 2 failures - RAID-1 (Mirroring), 2 failures - RAID-6 (Erasure Coding)
HFTT = 3 - FTT = 3 failures - RAID-1 (Mirroring)
Site disaster tolerance = None - standard cluster

vCenter equivalent options for Stretched Clusters
SFTT = 1 - Site disaster tolerance = Host mirroring - 2 node cluster, Site mirroring - stretched cluster
HFTT = 0 - FTT = No data redundancy, No data redundancy with host affinity
HFTT = 1 - FTT = 1 failure - RAID-1 (Mirroring), 1 failure - RAID -5 (Erasure Coding)
HFTT = 2 - FTT = 2 failures - RAID-1 (Mirroring), 2 failures - RAID-6 (Erasure Coding)

 

Q: What does it mean when it is in a warning state?

When in a warning state, it means that the cluster's current datastore policy is not optimal.  The test table has five columns: policy name | rule name | current value | suggested value| status.  The table has two rows: 1st row is for "Failure to tolerate" rule and 2nd row is for "Site disaster tolerance" rule.  Any row's status in a warning state means the current rule value does not match the suggested rule value.

Q: How does one troubleshoot and fix the error state?

One should go to "Policies and Profiles", select "VM Storage Policy" and click the policy name in the health test table. Then edit the "Failure to tolerate" rule or "Site disaster tolerance" rule using the suggested value shown in the health test table.

 

Additional Information