Virtual machines on vSAN datastores report as invalid after recovering from a power outage
vCenter server also resides on the vSAN datastore and is marked as invalid
vsan health indicates physical disk issues
esxcli vsan health cluster list
esxcli vsan health cluster listOverall health findings red (Physical disk issue)Physical disk red Physical disk health retrieval issues red Operation health yellow Congestion green Physical disk component utilization green Component metadata health green Memory pools (heaps) green Memory pools (slabs) green Disk capacity greenData red vSAN object health red vSAN object format health greenPerformance service red Stats DB object red Stats primary election green Performance data collection green All hosts contributing stats green Stats DB object conflicts greenCapacity utilization yellowIt is also observed that the objects are in inaccessible state, due to which the virtual machines are marked as invalid. Use the below command to verify the status of the objects.
esxcli vsan debug object health summary get
On verifying the capacity utilization of the disks, it is observed that few hosts do not have any disk groups. Note, there are no compute only nodes in the cluster
To check the capacity utilization use the below command:
cmmds-tool find -t HOSTNAME -f json | egrep "uuid|hostname" | sed -e 's/\"content\"://g' | awk '{print $2}' | sed -e 's/[\",\},\,]//g' | xargs -n 2 | while read hostuuid hostname; do echo -e "\n\nHost Name: $hostname::: Host UUID: $hostuuid\n Disk Name\t\t| Disk UUID\t\t| Disk Usage | Disk Capacity | Usage Percentage" ; cmmds-tool find -f python -t DISK -o $hostuuid | grep uuid | cut -c 13-48 | while read diskuuid;do cmmds-tool find -f json -t DISK -o $hostuuid -u $diskuuid| egrep "uuid|content" | sed -e 's/\"content\":|\\"uuid\"://g' | sed -e 's/[\",\},\]//g' | awk '{printf $0}' | sed -e 's/},/\n/g'| awk '{print $37 " " $5 " " $45}'| while read disknaa diskcap maxcomp; do diskcapused=$(cmmds-tool find -f json -t DISK_STATUS -u $diskuuid | grep content |sed -e 's/[\",\},\]//g' | awk '{print $3}'); diskperc=$(echo "$diskcapused $diskcap" | awk '{print $1/$2*100}'); if [ "$maxcomp" != 0 ]; then echo -en " $disknaa\t| $diskuuid\t| $diskcapused\t | $diskcap\t | $diskperc%\n"; fi;done;done;done;
Further, on running the below command on all the ESXi hosts in the cluster, it is observed that on few of the hosts, the disks are not recognized by cmmds
esxcli vsan storage list | grep -i cmmds
Sample output:
esxcli vsan storage list | grep -i cmmds In CMMDS: false In CMMDS: false In CMMDS: false In CMMDS: false In CMMDS: false In CMMDS: false In CMMDS: falseesxcli vsan storage list | grep -i cmmds In CMMDS: true In CMMDS: true In CMMDS: true In CMMDS: true In CMMDS: true In CMMDS: true In CMMDS: trueVMware VSAN 8.x
This issue occurs because the Cluster Monitoring, Membership, and Directory Services (CMMDS) is unable to validate the state of the disks residing on the hosts. As a result, these disks are reporting a "Stale" or "Unknown" status within the cluster directory. Since the disks are not recognized as active members, vSAN marks the data components residing on them as Absent, causing the associated objects to lose quorum and become inaccessible.
From the /var/run/log/vmkernel.log file of the ESXi host, below events will be reported indicating that the disks are detected as stale
2025-11-20T05:10:47.515Z In(182) vmkernel: cpu5:153229876)PLOG: PLOGMapDataPartition:3026: Mapping SSD cache data partition for 52a9e078-xxxx-xxxx-xxxx-xxxxxxxxxxxx not found SSD device mapped:0x0 fromRescan 0x12025-11-20T05:10:47.516Z In(182) vmkernel: cpu7:153229876)PLOG: PLOGProbeDevice:7022: Probed plog device <naa.6000xxxxxxxxxxxxxxxxxxxxx:1> 52a9e078-xxxx-xxxx-xxxx-xxxxxxxxxxxx 0x45xxxxxxxxxx exists.. continue with old entry
Place the host into maintenance mode with No action and reboot the host.
If the issue persists even after rebooting the host, engage hardware vendor to validate the health of the drives.