After planned maintenance and move of a vSAN cluster VMs are inaccessible.
search cancel

After planned maintenance and move of a vSAN cluster VMs are inaccessible.

book

Article ID: 391186

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Running the recovery.py script after reboot of all hosts in the cluster failed as it was not able to communicate across the vSAN network. 

VMs showing as inaccessible in vCenter

Environment

vSAN (all versions)

Cause

During the maintenance / move of the vSAN cluster, networking was changed resulting in a vSAN cluster network partition.  This could be due to a physical network change, such as mis-cabling the cluster or improper configuration at the new site. 

In order to function properly, all vSAN hosts should be able to communicate properly with each other via the vSAN Network.  If all ESXi hosts in the cluster cannot communicate, a vSAN cluster will split into multiple network partitions. (For example sub-groups of ESXi hosts that can talk to each other, but not to other sub-groups).
 
When this occurs, vSAN objects may become unavailable until the network misconfiguration is resolved. For smooth operations of production vSAN clusters, it is very important to have a stable network with no extra network partitions (For example: Only one vSAN network partition).
 

Resolution

Please view the following articles on how to perform vmkping tests across the vSAN network. 

Testing VMkernel network connectivity with the vmkping command

vSAN Health Service - Network Configuration - vMotion: Basic Connectivity Check

Ensure all cabling of the cluster is correct by physically inspecting the network cabling.