Q: What does the Network Health - Hosts with vSAN disabled check do?
This health check is similar to another check that looks for unexpected vSAN cluster members. This check ensures that all ESXi hosts in a vSAN cluster have vSAN enabled.
It is possible to have an ESXi host participating in a vSphere cluster that has vSAN enabled at the cluster level, but the ESXi host itself does not have vSAN enabled. This can arise when you use a combination of command line and the vSphere Web Client to manage the vSAN cluster. If a user only uses vCenter Server to manage vSAN, this check should never fail.
One other possible cause of this issue is a mis-configured or non-uniform Host Profile.
The most common cause of such a misconfiguration is that during the course of troubleshooting a vSAN issue, you had issued an esxcli vsan cluster leave command on an ESXi host, and if the ESXi host is not re-added to the vSAN cluster, the ESXi host no longer participates in vSAN.
What does it mean when it is in an error state?
While it may look from a vCenter Server perspective that the ESXi host is fully participating in the vSAN cluster, this may not be the case. By not participating in the cluster, available capacity for both space and performance is reduced.
More importantly, if this ESXi host stores any vSAN data (For example: virtual machine objects) on its local disks, having it removed from vSAN will impact object health.
By disabling vSAN on an ESXi host, all components on the host entering ABSENT states from the perspective of the active vSAN ESXi hosts in the cluster. If the ESXi host is disconnected from the vSAN cluster for longer than 60 minutes, the components marked as ABSENT will be rebuilt elsewhere in the cluster, leading to unnecessary rebuild I/O, which may in turn impact virtual machine I/O.
How does one troubleshoot and fix the error state?
Verify that all ESXi hosts that are part of the cluster have vSAN enabled. The command esxcli vsan cluster get, when run on individual ESXi hosts, can tell if a host is participating in the vSAN cluster. For example of running the command on an ESXi host that is part of a healthy 4-node vSAN cluster:
[root@esxi-1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2019-11-06T22:39:25Z
Local Node UUID: 5cb07974-5991-079e-9f83-005056016fbb
Local Node Type: NORMAL
Local Node State: BACKUP
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5cb079dc-f5d9-ec9e-95ab-005056016fc3
Sub-Cluster Backup UUID: 5cb07974-5991-079e-9f83-005056016fbb
Sub-Cluster UUID: 52b8fcb7-ab58-8f42-b546-67ffc5fd415d
Sub-Cluster Membership Entry Revision: 12
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 5cb079dc-f5d9-ec9e-95ab-005056016fc3, 5cb07974-5991-079e-9f83-005056016fbb, 5cb07a44-ec04-17c2-ed5a-005056016fcc
Sub-Cluster Member HostNames: esxi-2.gsslabs.org, esxi-1.gsslabs.org, esxi-3.gsslabs.org
Sub-Cluster Membership UUID: 4f16a55d-7273-d088-2e27-005056016fc3
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: 7c677ea0-6581-4c54-b104-356a0bfd2ffc 11 2019-11-01T23:54:05.272
Note: The local node UUID can be retrieved from the output. You can also see the Sub-Cluster Member UUIDS, of which there are four.
To get the UUID of an ESXi host that is not part of the vSAN Cluster, run this command:
esxcli system uuid get
Here is an example:
[root@ESXi-h01:~] esxcli system uuid get
545ca9af-ff4b-fc84-dcee-001f29595f9f
If an ESXi host is identified as not participating in the cluster, this command esxcli vsan cluster join may be used to add an ESXi host back into the cluster.
You can also run this RVC command vsan.cluster_info to display the ESXi hosts that are currently participating in the cluster. For more information on troubleshooting this configuration issue, see the Using RVC to verify Virtual SAN functionality section in the VMware Virtual SAN Diagnostics and Troubleshooting Reference Manual (See Attached).
Re-check the health status after running the esxcli vsan cluster join command, as there may be other underlying issues that caused the ESXi host to leave the cluster in the first place.