vSAN: Virtual machines may become unresponsive when multiple hosts are shut down beyond the storage policy FTT
search cancel

vSAN: Virtual machines may become unresponsive when multiple hosts are shut down beyond the storage policy FTT

book

Article ID: 424286

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In a vSAN environment, if more hosts are simultaneously unavailable than the allowed Failures To Tolerate (FTT), you may experience the following symptoms:

  • VM power operations (power on / shutdown) do not complete.
  • VM state becomes Invalid or Inaccessible.
  • Guest OS becomes unresponsive.
  • vSphere HA failover does not complete successfully and alarms may be triggered.

Environment

VMware vSAN 7.0
VMware vSAN 8.0

Cause

This behavior is expected based on vSAN FTT.

If the number of simultaneously unavailable hosts exceeds the storage policy FTT, the required number of components for certain vSAN objects (for example, VMDKs) may not be available, and those objects can become inaccessible. When VM configuration files (.vmx) and/or virtual disks (.vmdk) are inaccessible, VM power operations can fail and VMs may appear unresponsive.

Depending on component placement, either the VM configuration file or the virtual disk (or both) may be impacted; however, in all cases, the VM may become unresponsive. In addition, vSphere HA restarts can remain pending because the restart target hosts are in the same inaccessible state.

Resolution

If VM impact occurs after shutting down multiple vSAN hosts, you must restore the stopped hosts to recover vSAN object availability.

Recover vSAN hosts

  1. Power on the ESXi hosts that are stopped.
  2. If the hosts were placed into Maintenance Mode, exit Maintenance Mode.
  3. After all hosts are up, verify that vSAN Object Health returns to Healthy in Skyline Health:

    vSphere Client > [Cluster] > Monitor > vSAN > Skyline Health > All > vSAN Object Health



Recover virtual machines

  1. If a VM is shown as Invalid or is displayed by a vSAN object UUID, restart the ESXi host management service (hostd).
    For restart instructions, see Restarting Management Agents in ESXi
  2. Restart impacted VMs as needed to recover I/O.
  3. If the guest OS filesystem is affected, repair the filesystem in the guest OS or restore from backup if required.

Operational considerations

  • Do not shut down more hosts simultaneously than the configured FTT allows.
  • When entering Maintenance Mode, select Ensure accessibility to maintain access to required vSAN objects for VM operation.
  • If you anticipate scenarios where multiple hosts may be taken offline, consider increasing FTT in the storage policy in advance.

Additional Information