vSAN Health Service - Limits Health – After one additional host failure
search cancel

vSAN Health Service - Limits Health – After one additional host failure

book

Article ID: 327052

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article explains the Limits Health – After one additional host failure check in the vSAN Health Service and provides details on why it might report an error.

Environment

VMware vSAN 8.0.x
VMware vSAN 7.0.x
VMware vSAN 6.5.x
VMware vSAN 6.2.x
VMware vSAN 6.x
VMware vSAN 6.6.x
VMware vSAN 6.1.x
VMware vSAN 6.7.x
VMware vSAN 6.0.x

Cause

The vSAN disk capacity has reached the Threshold in the scenario of 1 host failure.

 

Resolution

Q: What does the Limits Health – After one additional host failure check do?

In addition to the basic limit health check, there is also a simulation of how resources would look like after an ESXi host failure has occurred. If a single ESXi host fails, two things can happen. First, the resources on that ESXi host (such as cache and capacity) are no longer available. Second, vSAN attempts to re-protect (rebuild) all components belonging to objects that are now currently running with reduced redundancy due to the failure.

This health check simulates both actions described above. If the ESXi host with the most resources consumed fails, this health check calculates how much resources would be used from the remaining hosts in the cluster, and how much resources would still be available.

Note: If there is already a failure in the cluster, this test will report on one additional failure. Therefore, this test reports on the results of the current failure and the additional failure that it introduces.

In vSphere 6.7 Update 3 and later  releases the Health check name is updated to "Capacity Utilization"

Q: What does it mean when it is in an error state?

If this check reports that after a host failure, more than 100% of resources will be used, it means that re-protection fails for some objects because there are not enough resources available.

Note: This health check simulation is very simple. It only looks at cluster aggregate resources, so just like the basic limits check, it does not consider the distribution and placement rules.

However, this simple simulation will verify that, after a failure, a vSAN cluster has been configured with enough resources to operate in an operationally safe manner after a re-protection. This test does not check for balance and fault domains, so these needs to be considered independently of this test.

For example, a user may enforce an operational business policy to have no less than 25% free disk space under normal conditions and no less than 15% free disk space after one failure. This check can be used to implement such a policy and to verify that this is indeed the case.

Q: How does one troubleshoot and fix the error state?

There is no troubleshooting involved in this health check. It is primarily for information only. If this health check fails, you may wish to add additional resources to the cluster to facilitate a successful rebuild after a failure. If you feel that there should be enough capacity in the cluster to rebuild after a failure, check to see if any of the components such as Disks drives are in a failed state.

Monitor vSAN Capacity (vmware.com)