Understanding vSAN Stretched Cluster Failure Scenarios
search cancel

Understanding vSAN Stretched Cluster Failure Scenarios

book

Article ID: 394978

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

When operating a vSAN stretched cluster environment, administrators need to understand how different failure scenarios impact virtual machine availability and data accessibility. This article provides detailed failure scenario tables showing the expected behavior for various failure types including host failures, site failures, witness failures, partition failures, and inter-site link (ISL) failures.

Administrators may observe:

  • Virtual machines becoming inaccessible during certain failure conditions
  • Different behaviors based on the Site Disaster Tolerance policy configuration
  • Varying impacts depending on whether Secondary Failures to Tolerate (FTT) is configured
  • Questions about data availability when multiple failures occur

Environment

VMware vSAN 7.x, 8.x, 9.x

Cause

The behavior during failure scenarios in vSAN stretched clusters is determined by the interaction between the Site Disaster Tolerance policy setting and the Secondary Failures to Tolerate (FTT) configuration. When failures occur, vSAN uses a voting mechanism to determine object availability based on component distribution across sites and the witness host. The specific combination of policy settings directly influences whether objects remain accessible and if virtual machines can continue running or need to be restarted during various failure conditions.

Resolution

Understanding the expected behavior for each failure scenario helps in planning disaster recovery strategies and setting appropriate storage policies. The following tables detail the behavior for each failure type based on policy configuration.

Host Failure Scenarios

Site Disaster ToleranceSecondary FTTVM LocationFailurevSAN BehaviorVM Behavior
None - PreferredNo data redundancySite A or BHost failure in Site AObjects are inaccessible if the failed host contains one or more components of an objectVM cannot be restarted as the object is inaccessible
None - PreferredRAID-1/5/6Site A or BHost failure in Site AObjects are accessible as there is site-local resilienceVM does not need to be restarted unless VM was running on the failed host
Site MirroringNo data redundancySite A or BHost failure Site A or BComponents on failed hosts are inaccessible, read and write IO across ISL without local redundancy and rebuild across ISLVM does not need to be restarted unless VM was running on the failed host
Site MirroringRAID-1/5/6Site A or BHost failure Site A or BComponents on failed hosts are inaccessible. Read IO locally due to RAID, and rebuild locallyVM does not need to be restarted unless VM was running on failed host

Partition Failure Scenarios

Site Disaster ToleranceSecondary FTTVM LocationFailurevSAN BehaviorVM Behavior
None - PreferredNo data redundancySite BPartition Site BObjects are accessible in Site BVM resides in Site B, does not need to be restarted
Site MirroringNo data redundancySite APartition Site AObjects are inaccessible in Site A as the full site is partitioned, and the quorum is lostVM restarted in Site B
Site MirroringNo data redundancySite BPartition Site AObjects are inaccessible in Site A as the full site is partitioned, and the quorum is lostVM does not need to be restarted as it resides in Site B

Site Failure Scenarios

Site Disaster ToleranceSecondary FTTVM LocationFailurevSAN BehaviorVM Behavior
None - PreferredNo data redundancySite AFull failure Site AObjects are inaccessible as the full site failedVM cannot be restarted in Site B, as all objects reside in Site A
None - PreferredNo data redundancySite BFull failure Site BObjects are accessible, as only Site A contains objectsVM can be restarted in Site A, as that is where all objects reside
Site MirroringNo data redundancySite AFull failure Site AObjects are inaccessible in Site A as full site failedVM restarted in Site B
Site MirroringNo data redundancySite BFull failure Site AObjects are inaccessible in Site A as the full site failedVM does not need to be restarted as it resides in Site B
Site MirroringNo data redundancySite AFull failure in Site A and simultaneous host failure in Site BObjects are inaccessible in Site A. If components reside on the failed host then the object is inaccessible in Site BVM cannot be restarted
Site MirroringNo data redundancySite AFull failure in Site A and simultaneous host failure in Site BObjects are inaccessible in Site A. If components do not reside on the failed host, then the object is accessible in Site BVM restarted in Site B
Site MirroringRAID-1/5/6Site AFull failure in Site A and simultaneous host failure in Site BObjects are inaccessible in Site A, accessible in Site B as there's site-local resiliencyVM restarted in Site B

Witness Failure Scenarios

Site Disaster ToleranceSecondary FTTVM LocationFailurevSAN BehaviorVM Behavior
None - PreferredNo data redundancySite AWitness host failureNo impact, witness host is not used as data is not replicatedNo impact
None - Non-PreferredNo data redundancySite BWitness host failureNo impact, the witness host is not used as data is not replicatedNo impact
Site MirroringNo data redundancySite AWitness host failureWitness object inaccessible, VM remains accessibleVM does not need to be restarted
Site MirroringNo data redundancySite BWitness host failureWitness object inaccessible, VM remains accessibleVM does not need to be restarted
Site MirroringNo data redundancySite AFull failure Site A and simultaneous Witness Host FailureObjects are inaccessible in Site A and Site B due to quorum being lostVM cannot be restarted
Site MirroringNo data redundancySite AFull failure Site A followed by Witness Host Failure a few minutes laterPre vSAN 7.0 U3: Objects are inaccessible in Site A and Site B due to quorum being lostVM cannot be restarted
Site MirroringNo data redundancySite AFull failure Site A followed by Witness Host Failure a few minutes laterPost vSAN 7.0 U3: Objects are inaccessible in Site A, but accessible in Site B as votes have been recountedVM restarted in Site B
Site MirroringNo data redundancySite BFull failure Site B followed by Witness Host Failure a few minutes laterPost vSAN 7.0 U3: Objects are inaccessible in Site B, but accessible in Site A as votes have been recountedVM restarted in Site A

Inter-Site Link (ISL) Failure Scenarios

Site Disaster ToleranceSecondary FTTVM LocationFailurevSAN BehaviorVM Behavior
Site MirroringNo data redundancySite ANetwork failure between Site A and B (ISL down)Site A binds with the witness, and objects in Site B become inaccessibleVM does not need to be restarted
Site MirroringNo data redundancySite BNetwork failure between Site A and B (ISL down)Site A binds with the witness, and objects in Site B become inaccessibleVM restarted in Site A

Adaptive Quorum Control

vSAN 7 U3 introduced Adaptive Quorum Control (AQC) to improve data availability during specific failure conditions. This feature maintains data availability of objects during a site failure (or maintenance) followed by subsequent unavailability of the witness host.

In a fully operational stretched cluster, quorum is determined through a voting mechanism that accounts for object components in both sites and the witness host appliance. When a data site experiences a planned or unplanned outage, vSAN adjusts the votes to favor the active site that still has quorum. This adjustment allows sufficient votes to maintain quorum and keeps data available during a planned or unplanned outage of the witness host appliance.

The vote adjustment process may take a few seconds to a few minutes depending on cluster size. As each object completes adjustment, that object can tolerate witness host failure while maintaining availability. This capability does not protect against simultaneous failure of a data site and witness.

Recovery from Complex Failures

In conditions of a double site failure, where one data site fails simultaneously with the witness site, data and VMs become unavailable as they cannot achieve quorum. This protection mechanism prevents updating data in two different locations.

There may be a chance to recover the data in the single remaining site when it is known that the other data site and the witness site are not coming back. For all versions up to and including vSAN 8 U3 (VCF 5.2), this involves contacting Global Support (GS) to determine viability of potential recovery. Please note, this is a best-effort situation and does not guarantee sanity of data inside guest VMs when recovering from stale components.

Additional Information

For vSAN stretched clusters, avoid using a storage policy with locality=none. When using a storage policy with locality=none, the components of the same replica can be spread across both data sites in the cluster. This can result in:

  • Undesired issues during reconfiguring tasks of an object such as storage policy changes
  • Issues when placing a host into maintenance mode with ensure accessibility
  • Possibility of objects going inaccessible during planned maintenance
  • Read locality not being guaranteed as reads may go across data sites via the Inter-Site Link (ISL), resulting in latency

In the case of a storage policy with Site disaster tolerance set to one of the below options with the RAID set to RAID1/5/6, the writes would be limited to the site to which the locality is set:

  • Dual site mirroring (stretch cluster)
  • None - keep data on Preferred (stretch cluster)
  • None - keep data on Secondary (stretch cluster)

The issue is specific to stretch cluster storage policies set as Site disaster tolerance with either "None - standard cluster" or "None - stretched cluster" with the RAID set to RAID1/5/6.

For more details, see: