All VMs inaccessible after failover on vSAN stretched cluster
search cancel

All VMs inaccessible after failover on vSAN stretched cluster

book

Article ID: 402974

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • The vSAN cluster is using fully separate physical network infrastructure for vSAN networking
  • The preferred site experiences a full vSAN network partition, such as the switch used for vSAN networking has failed
  • The witness tagged vmkernel port remains active and connected on the preferred sites hosts
  • The witness node does not cluster the secondary site and remains clustered only with the preferred site original leader node
  • Due to loss of quorum virtual machines and data objects on the vSAN datastore become inaccessible

Environment

vSAN 8.0 U3 (any U3 build) and 9.0

vSAN stretched cluster

Cause

The secondary site leader node reports a fitness score of 0 to the witness appliance when queried. As both the preferred and secondary site report a fitness of 0 the witness remains clustered with the original preferred site leader.

Looking at the cluster fitness (which is sum of the node fitness entries), we see on leader node:

var/run/log/vsansystem.log:2025-06-25T13:54:05.195Z In(166) vsansystem[2102099]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSLocalObjectEntryUpdate-859c] Updating node fitness value to 0

On the backup node in secondary site we see it becomes leader:

var/run/log/vmkernel.log:2025-06-25T13:54:53.170Z In(182) vmkernel: cpu24:2100572)CMMDS: CMMDSLogStateTransition:1838: ########-####-####-####-############: Transitioning(########-####-####-####-############) from Backup to Leader: (Reason: Backup is taking over the cluster leader)

And the secondary site leader fitness is:

var/run/log/vsansystem.log:2025-06-25T13:51:06.582Z In(166) vsansystem[2102139]: [vSAN@6876 sub=VsanSystemProvider opId=CMMDSLocalObjectEntryUpdate-f56c] Updating node fitness value to 0

Resolution

This will be resolved in a future release of 8.0 U3 and a future release of 9.0

You may workaround this issue by disabling the witness tag on the preferred site hosts, forcing the witness to only communicate with the secondary site.