After a power outage the vSAN datastore is not available / VMs show as Invalid
search cancel

After a power outage the vSAN datastore is not available / VMs show as Invalid

book

Article ID: 392542

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

After a power outage and power is restored, vSAN datastore is showing inaccessible, and VMs may appear to be in an invalid state.

From the Host UI the vSAN datastore doesn't show it's total expected size, just the size of the total disks associated with the host or the total size of all hosts that can communicate with each other

 

Environment

VMware vSAN (any version) 

Cause

Though overall power may have been restored, it's important to make sure every piece of infrastructure has power and is working correctly after an outage event. Especially network switches. If a switch is still down this can result in a vSAN network partition which would render VMs inaccessible. 

Resolution

To verify a cluster partition caused by networking where vCenter resides on the vSAN datastore follow the below steps.

  1. SSH into a host and run the following command: esxcli vsan cluster get
  2. Run command esxcli vsan health cluster list -w to check vSAN Health
    Health Test Name                                                       Status
    ---------------------------------------------------------------------  ------
    Overall health                                                         red (Network misconfiguration)
    Network                                                                red
      Hosts with connectivity issues (hostconnectivity)                    red
      vSAN cluster partition (clusterpartition)                            red
      All hosts have a vSAN vmknic configured (vsanvmknic)                 green
      vSAN: Basic (unicast) connectivity check (smallping)                 green
      vSAN: MTU check (ping with large packet size) (largeping)            green
      vMotion: Basic (unicast) connectivity check (vmotionpingsmall)       green
      vMotion: MTU check (ping with large packet size) (vmotionpinglarge)  green
      Network latency check (hostlatencycheck)                             green
    Data                                                                   red
      vSAN object health (objecthealth)                                    red
      vSAN object format health (objectformat)                             green
    Cluster                                                                yellow
      Advanced vSAN configuration in sync (advcfgsync)                     green
      vSAN daemon liveness (clomdliveness)                                 green
      vSAN Disk Balance (diskbalance)                                      green
      Resync operations throttling (resynclimit)                           green
      Software version compatibility (upgradesoftware)                     green
      Disk format version (upgradelowerhosts)                              yellow
    Capacity utilization                                                   yellow
      Storage space (diskspace)                                            yellow
      Read cache reservations (rcreservation)                              green
      Component (nodecomponentlimit)                                       green
      What if the most consumed host fails (limit1hf)                      green
    Performance service                                                    yellow
      Performance service status (perfsvcstatus)                           yellow
    Physical disk                                                          green
      Operation health (physdiskoverall)                                   green
      Disk capacity (physdiskcapacity)                                     green
      Congestion (physdiskcongestion)                                      green
      Component limit health (physdiskcomplimithealth)                     green
      Component metadata health (componentmetadata)                        green
      Memory pools (heaps) (lsomheap)                                      green
      Memory pools (slabs) (lsomslab)                                      green
  3. Run command esxcli vsan health cluster get -t clusterpartition to see how the cluster is partitioned
  4. Run command esxcli vsan network list to see which vmk is configured for vSAN
  5. Run the below script to check network connectivity between the vSAN hosts replacing vmk(x) with the vmk# from the command run in step 4

    for i in `localcli vsan cluster unicastagent list | grep true | awk '{ print $4}'`; do echo "pinging $i 3 times"; echo; vmkping -I vmk(x) $i -s 1472 -d -c 20 -i .05; echo; echo "**************************"; done

If there is a network partition, correct the underlying networking issue to restore communication and resolve the partition, which will make data accessible again. Please see: vSAN Network Partition