vSAN Storage Objects may become Inaccessible while shutting down/rebooting the vSAN backup host

search cancel

vSAN Storage Objects may become Inaccessible while shutting down/rebooting the vSAN backup host

book

Article ID: 314310

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

During maintenance or upgrade processes involving the shutdown or reboot of the vSAN backup host, users may notice objects becoming inaccessible in the vSAN environment. This issue may manifest in several ways:

Inaccessible objects reported by vSAN Skyline Health or other management tools.
Virtual machines (VMs) residing on the affected vSAN datastore may fail to start, migrate, or report as invalid.
VMDKs residing on the affected vSAN datastore may enter a read-only state.

Environment

VMware vSAN (All Versions)

Cause

This rare occurrence happens when the CMMDS on the vSAN Backup node loses the ability to receive Heartbeats (HBs) but can continue to transmit them for a brief period, especially during a reboot of the vSAN Backup host. As a result, there can be a temporary cluster partition since the backup node may become the Leader node during the reboot process.

Resolution

vSAN engineering is aware of this issue and is working on a fix, to be included in the next release.

Workaround:
Either wait out the reboot of the backup host for the VMs/objects to become accessible again or if there are critical VMs in the environment that can't handle a temporary outage follow the below steps to network isolate the host.

1) Prior to scheduled maintenance run the below script on any host in the cluster to identify the cluster Backup node
echo -e "\nHostname: Backup_UUID"; SCMU=$(esxcli vsan cluster get | grep 'Sub-Cluster Backup' | awk -F '\: ' '{print $2}'); cmmds-tool find -f json -t HOSTNAME |grep -E "uuid|content"|sed 'N;s/\n/ /'|awk -F \" '{print $10": " $4}'|sort| grep $SCMU

Sample output
Hostname: Backup_UUID
esxi4.vsancluster.org: ########-####-####-####-#############

2) Once the Backup host is identified in vCenter select the host > Configure > VMkernel adapters > vSAN vmk > click on the 3 ellipses Edit and remove the vSAN tag to network isolate the host

Note: For cluster upgrades using vLCM, exclude the backup host when upgrading the entire cluster and do the upgrade last after it's been network isolated with the above steps. You can also manually upgrade the hosts one at a time instead of using vLCM, which is not ideal for large clusters.

Additional Information

VMs/vSAN objects become temporarily inaccessible until the reboot of the backup node completes.

Feedback

thumb_up Yes

thumb_down No