vSAN -- During or after vSAN Hosts are patched/rebooted, vCenter and other Production VMs are not accessible
book
Article ID: 415211
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
While rebooting one or more vSAN Hosts of the same vSAN Cluster you experience one or more of the following symptoms:
Virtual Machine(s) cannot be accessed via e.g. Remote Desktop (RDP)
Virtual Machine(s) Operating System is reporting I/O errors (= inside the VM)
Virtual Machine(s) are showing as inaccessible in in the vSphere Client.
Example:
Virtual Machine(s) do show as invalid in the Host Client
Environment
vSphere ESXi - All Versions
Cause
More vSAN Hosts are unavailable to the Cluster than the configured Data Redundancy allows.
As a result, the vSAN Data of one or more VMs becomes unavailable (= inaccessible) resulting in the VM(s) not being available (= inaccessible) and/or reporting errors.
A vSAN Host is unavailable to the Cluster when it is offline, in Maintenance Mode or not able to communicate to the other vSAN Hosts via Network.
Resolution
Data Redundancy is configured via the following settings available in the Storage Policy assigned to affected VM(s):
Depending on the configured setting, ensure that the maximum number of unavailable vSAN Hosts is not higher than defined below:
FTT=0: None of the Hosts can be unavailable without being put in Maintenance Mode. Only one Host can be in Maintenance Mode with option "Ensure Accessibility" or "Full Data Migration" at a certain time.
FTT=1: Only one Host can be unavailable to the Cluster at a certain time
FTT=2: Only two Hosts can be unavailable to the Cluster at a certain time
FTT=3: Only three Hosts can be unavailable to the Cluster at a certain time
It is not recommended to use ”Site disaster tolerance: None (standard cluster)" since this setting can increase the risk of Data unavailability during outages or Maintenance activities.
It is recommended to use "Site disaster tolerance: Site mirroring - stretched cluster" to ensure Data availability persists in the event an entire site goes down.