Few vSAN objects are reported in an Inaccessible or Unknown state following a data center-wide power outage.
Skyline Health displays an Operational Health Warning.
A capacity disk on one of the host is in an Absent state.
Deduplication and Compression is enabled and the entire disk group associated with the failed disk is marked as unhealthy.
Remaining objects in the cluster are performing a resync, but specific objects do not progress or recover and are in inaccessible state.
VMware VSAN 8.x
The failure is caused by a violation of the RAID-6 (Erasure Coding) quorum requirements. In a vSAN RAID-6 (4+2) configuration, an object is distributed across six components (4 data, 2 parity). This policy allows the object to remain accessible if a maximum of two components are lost.
In this scenario, the combination of two factors led to data unavailability:
Abrupt Power Shutdown: Placed various components into a transient Absent state across multiple hosts.
Permanent Hardware Failure: A capacity disk on one of the hosts failed permanently during the power cycle.
Because deduplication is enabled, the loss of a single capacity disk invalidated the entire disk group. This resulted in more than two components becoming unavailable simultaneously for the impacted objects, exceeding the Failures to Tolerate (FTT=2) threshold.
To confirm the state of the components, the following vSAN management command can be used
esxcli vsan debug object list --all --health=inaccessible
Sample Output:
Object UUID: b6833f65-0028-38e3-da1f-xxxxxxxxxxxx Version: 15 Health: inaccessible - Lost data availability.(APD) Owner: xxxxxxx Size: 0.00 GB Used: 3.54 GB Policy: Configuration:
RAID_6 Component: 265e7e68-d81f-9c6a-13e1-xxxxxxxxxxxx Component State: ABSENT, Address Space(B): 68451041280 (63.75GB), Disk UUID: 52fab51e-d37c-22ac-6f19-xxxxxxxxxxxx, Disk Name: N/A Votes: 2, Host UUID: None Component: b6833f65-fa71-d0e6-9164-xxxxxxxxxxxx Component State: ACTIVE, Address Space(B): 68451041280 (63.75GB), Disk UUID: 52ce3444-e8a5-7a40-16a1-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 742391808 (0.69GB), Physical Capacity Used(B): 734003200 (0.68GB), Host Name: xxxxxxxxxxxx Component: b6833f65-3c2a-d5e6-ab8c-xxxxxxxxxxxx Component State: ABSENT, CSN: STALE (981!=982), Address Space(B): 68451041280 (63.75GB), Disk UUID: 52d160bf-22ad-62d8-d3e7-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 792723456 (0.74GB), Physical Capacity Used(B): 784334848 (0.73GB), Host Name: xxxxxxxxxxxx RAID_D Component: b6833f65-30b3-d9e6-c265-xxxxxxxxxxxx Component State: ABSENT, CSN: STALE (976!=982), Address Space(B): 68451041280 (63.75GB), Disk UUID: 528a93b2-955c-19b3-3d61-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 721420288 (0.67GB), Physical Capacity Used(B): 713031680 (0.66GB), Host Name: xxxxxxxxxxxx Component: 7c02d769-ced8-68d8-008e-xxxxxxxxxxxx Component State: ABSENT, CSN: STALE (978!=982), Address Space(B): 68451041280 (63.75GB), Disk UUID: 5225b0e2-d929-11ea-a2ab-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 25165824 (0.02GB), Physical Capacity Used(B): 20971520 (0.02GB), Host Name: xxxxxxxxxxxx Component: b6833f65-48a4-dde6-90c3-xxxxxxxxxxxx Component State: ACTIVE, Address Space(B): 68451041280 (63.75GB), Disk UUID: 52abf5be-ba18-800e-63db-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 801112064 (0.75GB), Physical Capacity Used(B): 792723456 (0.74GB), Host Name: xxxxxxxxxxxx Component: d77a7565-22b0-5b2b-8644-xxxxxxxxxxxx Component State: ACTIVE, Address Space(B): 68451041280 (63.75GB), Disk UUID: 5298aabe-25b8-d39f-725d-xxxxxxxxxxxx, Disk Name: naa.#############:2 Votes: 1, Capacity Used(B): 759169024 (0.71GB), Physical Capacity Used(B): 750780416 (0.70GB), Host Name: xxxxxxxxxxxx
To address the inaccessible objects and disk failure, below steps should be performed.
1. Restore from Backup
Since more than two components of the RAID-6 stripe are missing or permanently lost due to the disk group failure, the data for these objects is mathematically incomplete and cannot be recovered by the vSAN layer.
Identify the Virtual Machines associated with the inaccessible Object IDs.
Initiate a Restore from Backup for the impacted VMs.
2. Hardware Remediation
Place the host with failed capacity disk in maintenance mode with ensure accessibility
Delete the unhealthy diskgroup
Replace the failed capacity disk
Recreate the disk group once the hardware is healthy to restore the cluster to full capacity and redundancy.
Monitor the vSAN Resyncing Objects dashboard to ensure all other data has finished resyncing.