After a power outage the VMs on vSAN datastore show as Invalid
search cancel

After a power outage the VMs on vSAN datastore show as Invalid

book

Article ID: 419694

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • Virtual machines on vSAN datastores report as invalid after recovering from a power outage

  • vCenter server also resides on the vSAN datastore and is marked as invalid

  • vsan health indicates physical disk issues 

    esxcli vsan health cluster list

    esxcli vsan health cluster list
    Overall health findings                             red (Physical disk issue)
    Physical disk                                       red
      Physical disk health retrieval issues             red
      Operation health                                  yellow
      Congestion                                        green
      Physical disk component utilization               green
      Component metadata health                         green
      Memory pools (heaps)                              green
      Memory pools (slabs)                              green
      Disk capacity                                     green
    Data                                                red
      vSAN object health                                red
      vSAN object format health                         green
    Performance service                                 red
      Stats DB object                                   red
      Stats primary election                            green
      Performance data collection                       green
      All hosts contributing stats                      green
      Stats DB object conflicts                         green
    Capacity utilization                                yellow
  • It is also observed that the objects are in inaccessible state, due to which the virtual machines are marked as invalid. Use the below command to verify the status of the objects.

    esxcli vsan debug object health summary get

  • On verifying the capacity utilization of the disks, it is observed that few hosts do not have any disk groups. Note, there are no compute only nodes in the cluster

    To check the capacity utilization use the below command:

    cmmds-tool find -t HOSTNAME -f json | egrep "uuid|hostname" | sed -e 's/\"content\"://g' | awk '{print $2}' | sed -e 's/[\",\},\,]//g' | xargs -n 2 | while read hostuuid hostname; do echo -e "\n\nHost Name: $hostname::: Host UUID: $hostuuid\n Disk Name\t\t| Disk UUID\t\t| Disk Usage     | Disk Capacity | Usage Percentage" ; cmmds-tool find -f python -t DISK -o $hostuuid | grep uuid | cut -c 13-48 | while read diskuuid;do cmmds-tool find -f json -t DISK -o $hostuuid -u $diskuuid| egrep "uuid|content" | sed -e 's/\"content\":|\\"uuid\"://g' | sed -e 's/[\",\},\]//g' | awk '{printf $0}' | sed -e 's/},/\n/g'| awk '{print $37 " " $5 " " $45}'| while read disknaa diskcap maxcomp; do diskcapused=$(cmmds-tool find -f json -t DISK_STATUS -u $diskuuid | grep content |sed -e 's/[\",\},\]//g' | awk '{print $3}'); diskperc=$(echo "$diskcapused $diskcap" | awk '{print $1/$2*100}'); if [ "$maxcomp" != 0 ]; then echo -en " $disknaa\t| $diskuuid\t| $diskcapused\t | $diskcap\t | $diskperc%\n"; fi;done;done;done;

  • Further, on running the below command on all the ESXi hosts in the cluster, it is observed that on few of the hosts, the disks are not recognized by cmmds

    esxcli vsan storage list | grep -i cmmds

    Sample output:

    esxcli vsan storage list | grep -i cmmds
       In CMMDS: false
       In CMMDS: false
       In CMMDS: false
       In CMMDS: false
       In CMMDS: false
       In CMMDS: false
       In CMMDS: false

    On a healthy ESXi host with no physical disk issues observed, the output will be as below:
    esxcli vsan storage list | grep -i cmmds
       In CMMDS: true
       In CMMDS: true
       In CMMDS: true
       In CMMDS: true
       In CMMDS: true
       In CMMDS: true
       In CMMDS: true

Environment

VMware VSAN 8.x

Cause

This issue occurs because the Cluster Monitoring, Membership, and Directory Services (CMMDS) is unable to validate the state of the disks residing on the hosts. As a result, these disks are reporting a "Stale" or "Unknown" status within the cluster directory. Since the disks are not recognized as active members, vSAN marks the data components residing on them as Absent, causing the associated objects to lose quorum and become inaccessible.

Cause Validation:

From the /var/run/log/vmkernel.log file of the ESXi host, below events will be reported indicating that the disks are detected as stale

2025-11-20T05:10:47.515Z In(182) vmkernel: cpu5:153229876)PLOG: PLOGMapDataPartition:3026: Mapping SSD cache data partition for 52a9e078-xxxx-xxxx-xxxx-xxxxxxxxxxxx not found SSD device mapped:0x0 fromRescan 0x1
2025-11-20T05:10:47.516Z In(182) vmkernel: cpu7:153229876)PLOG: PLOGProbeDevice:7022: Probed plog device <naa.6000xxxxxxxxxxxxxxxxxxxxx:1> 52a9e078-xxxx-xxxx-xxxx-xxxxxxxxxxxx 0x45xxxxxxxxxx exists.. continue with old entry

Resolution

Place the host into maintenance mode with No action and reboot the host.

If the issue persists even after rebooting the host, engage hardware vendor to validate the health of the drives.