vSAN Health Service - Network Health - Unexpected vSAN cluster members
search cancel

vSAN Health Service - Network Health - Unexpected vSAN cluster members

book

Article ID: 315548

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article explains the Network Health - Unexpected vSAN cluster members check in the vSAN Health Service and provides details on why it might report an error.


Environment

VMware vSAN (All Versions)

Resolution

Q: What does the Network Health - Unexpected vSAN cluster members check do?

This health service check tests if all ESXi hosts participating in vSAN are part of the same vSphere cluster. This is important, as cluster-wide processes such as enabling Distributed Resource Scheduler (DRS), or enabling vSphere High Availability (HA) cannot include ESXi hosts that are not part of the vSphere cluster and can lead to operation issues.
 
This check compares the vSphere cluster members to the vSAN cluster members. If you only use the vCenter Server to manage vSAN, this check should never fail, as by definition a vSAN cluster and a vSphere cluster should in effect have all the same members.
 
However, if you use the command line at any time for a cluster membership, (For example esxcli vsan cluster join), it is quite possible to create a mis-configured cluster, where an ESXi host that participates in vSAN is not part of the vSphere cluster. Another possibility is that host profiles were used.

Q: What does it mean when it is in an error state?

Even though the ESXi host might not be part of the vSphere cluster, vSAN will still utilize the ESXi host, use it to store data and service I/O. In other words, the datastore functions properly and correctly.
 
ESXi hosts that are disconnected from their vCenter Server could show up in this way, and reconnecting them to the vCenter Server will resolve this health check issue.
 
However, when the cluster is in such a situation, it can give rise to operational hazards. As the ESXi host is not tracked as a part of the vSphere cluster, it is very easy to overlook the critical role the ESXi host plays in the availability and persistence of data on vSAN.
 
For example, inadvertently rebooting or re-purposing the ESXi host for another use, or by simply placing it into maintenance mode may cause issues to the vSAN cluster and impact the availability of the virtual machines running on the cluster. You may not notice that impact based on what the vSphere Web Client reports. There may be no warning generated that the ESXi host is about to be re-purposed for some other user, as the vCenter Server does not recognize the ESXi host to be part of the cluster.

Q: How does one troubleshoot and fix the error state?

To get an overall view of the vSAN cluster state, the Ruby vSphere Console (RVC) commands such as vsan.cluster_info can help. This displays all ESXi hosts that are participating in vSAN, and can be used to compare against the list of hosts that are part of the vSphere cluster to determine which one is not included.
 
To get an individual hosts view of the cluster, run this command:

esxcli vsan cluster get
 
You should check why the ESXi host is not a part of the vSAN cluster but is part of the vSphere cluster. If the ESXi host was joined to the cluster in error using the Command Line Interface (CLI), another command, esxcli vsan cluster leave can be used to take the ESXi host back out of the cluster.
 
However, the ESXi host should first be put into maintenance mode using the full data migration option to evacuate all data first and ensure data availability.
 
If the ESXi host fails to leave the cluster, make a note of the reason. Also, make note of any warnings or errors that are created in the /var/log/vmkernel.log file when this operation is attempted. Contact VMware Global Support Services if the issue persists. For more information, see Creating and managing Broadcom support cases
 
If there is no difference in the list of hosts in vSphere cluster and vSAN cluster, restart the vsanmgmt service on the all hosts and RETEST vSAN health :
/etc/init.d/vsanmgmtd restart

There shouldn't be any impact of restarting this service on hosts. 


Additional Information