VMs with large-sized vmdks on vSAN ESA may become inaccessible
search cancel

VMs with large-sized vmdks on vSAN ESA may become inaccessible

book

Article ID: 326662

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:
There is a critical issue potentially impactful to large sized objects on vSAN Express Storage Architecture (ESA) versions 8.0, 8.0U1, 8.0U1c, and 8.0U2. While scrubbing vmdk objects with large used capacity, the scrubber could start too many concurrent operations to persist the scrubber progress eventually causing these objects to become inaccessible. VMs would need to be recovered from backups or by working with vSAN customer support to make the objects accessible again.

Environment

VMware vSAN 8.0.x

Cause

The object(s) go inaccessible when vSAN components are marked ABSENT as internal metadata operations (CCPs) fail due to an internal race condition with scrubber.

Resolution

VMware highly recommends upgrading vSAN ESA clusters to 8.0U2b.

 


Workaround:
If an upgrade to 8.0U2b is not possible right away then we recommend performing the following steps:

Run the following command esxcfg-advcfg -s 0 /VSAN/ObjectScrubPersistMin on all hosts in the cluster to avoid more objects falling under the same condition and do not change it until the cluster has been upgraded to 8.0U2b.

Note: This change will prevent objects from falling into this state and will not impact any process required to do IO operations or manage vSAN.

Once the cluster has been upgraded to fixed in version 8.0U2b please run the following command esxcfg-advcfg -d /VSAN/ObjectScrubPersistMin on all hosts in the cluster to revert back to the default value.



Additional Information

Impact/Risks:
  • Virtual Machines with vmdks with large used capacity may fail to power on or fail to access their vmdks following scrubber operations
  • Virtual Machines may appear greyed out in vCenter
  • Further investigation determines that the backing vSAN ESA vmdk object goes in an inaccessible state
  • Likelihood of encountering this issue gradually increases over time as vmdk's capacity usage increases