vSAN Health Service - Cluster health - vSAN optimal datastore default policy configuration
search cancel

vSAN Health Service - Cluster health - vSAN optimal datastore default policy configuration

book

Article ID: 314301

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article introduces Cluster health - vSAN optimal datastore default policy configuration check in the vSAN Health Service and provides details on why it might report the error and how to fix the warning/error state.


Environment

VMware vSAN 8.0.x

Resolution

Q: What does the Cluster Health – vSAN optimal datastore default policy configuration check do?

This health test will check if the cluster's current datastore default policy is optimal or not.  The optimal policy for different cluster types and sizes can be referred to in the below table.
Note: EMM = Enter Maintenance Mode, HFTT = Host Failures to Tolerate, SFTT = Site Failures to Tolerate

Type

Number of Nodes

Recommended FTT

DetailsHost EMM and Remove Operation Impact
 With node reservation  
Standard cluster3

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=None - standard cluster

N/A

Use existing Default vSAN policy

 

Keep the current behavior 

4

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

 

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=None - standard cluster

Create new RAID-5 policy 

 

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac. 
5

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-5 policy

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-6 policy.

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac.
7 and more

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=None - standard cluster

Create new RAID-6 policy.

For 7 nodes: 

  • User can put two hosts in EMM using EnsureAcc. 
  • Can remove 1-node from cluster with full data evac.
Stretched clusterIf nodes on each side <=2

HFTT=No data redundancy

SFTT=Site mirroring - stretched cluster

(To tolerate n failure, needs 2n+1 hosts in each cluster site)

N/ACreate new vSAN ESA stretched cluster policyExisting behavior. 
If nodes on each side ==3

HFTT=1 failure - RAID-1 (Mirroring)

SFTT=Site mirroring - stretched cluster

N/ACreate new vSAN ESA stretched cluster policyExisting behavior. 
If nodes on each side >=4 and <= 5

HFTT=1 failure - RAID -5 (Erasure Coding)

SFTT=Site mirroring - stretched cluster

N/ACreate new vSAN ESA stretched cluster policy RAID-5 policy
  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac
If nodes on each side >= 6

HFTT=2 failures - RAID-6 (Erasure Coding)

SFTT=Site mirroring - stretched cluster

N/ACreate new vSAN ESA stretched cluster R-6 policy

For 6 Nodes: 

  • User can put one host in EMM using EnsureAcc. 
  • Can not remove node from cluster with full data evac.

For 7 nodes: 

  • User can put two hosts in EMM using EnsureAcc. 
  • Can remove 1-node from cluster with full data evac.
2-node Stretch2, Fixed configuration

HFTT=No data redundancy

SFTT=Site mirroring - stretched cluster

N/AUse existing Default vSAN policyExisting behavior. 

Note: If using Host mirroring - 2 node cluster, SFTT = 1 and HFTT = 1 and requires a minimum of 3 disk groups per data host or 3 disks in a storage pool

Note: vCenter equivalent options for Standard Clusters

HFTT = 0 - FTT = No data redundancy, No data redundancy with host affinity
HFTT = 1 - FTT = 1 failure - RAID-1 (Mirroring), 1 failure - RAID -5 (Erasure Coding)
HFTT = 2 - FTT = 2 failures - RAID-1 (Mirroring), 2 failures - RAID-6 (Erasure Coding)
HFTT = 3 - FTT = 3 failures - RAID-1 (Mirroring)
Site disaster tolerance = None - standard cluster

vCenter equivalent options for Stretched Clusters
SFTT = 1 - Site disaster tolerance = Host mirroring - 2 node cluster, Site mirroring - stretched cluster
HFTT = 0 - FTT = No data redundancy, No data redundancy with host affinity
HFTT = 1 - FTT = 1 failure - RAID-1 (Mirroring), 1 failure - RAID -5 (Erasure Coding)
HFTT = 2 - FTT = 2 failures - RAID-1 (Mirroring), 2 failures - RAID-6 (Erasure Coding)

 

Q: What does it mean when it is in a warning state?

When in a warning state, it means that the cluster's current datastore policy is not optimal.  The test table has five columns: policy name | rule name | current value | suggested value| status.  The table has two rows: 1st row is for "Failure to tolerate" rule and 2nd row is for "Site disaster tolerance" rule.  Any row's status in a warning state means the current rule value does not match the suggested rule value.

Q: How does one troubleshoot and fix the error state?

One should go to "Policies and Profiles", select "VM Storage Policy" and click the policy name in the health test table. Then edit the "Failure to tolerate" rule or "Site disaster tolerance" rule using the suggested value shown in the health test table.

 


Additional Information