There is no method to prevent the logical(physical) failure of a disks blocks as SSDs degrade overtime, therefore when a failure to read is experienced, in the metadata or dedupe metadata region vSAN fails out the disk or disk group if dedupe is enabled.
The process presented below has no additional impacts or risks. The underlying problem could result in a DU (Data Unavailable) or DL (Data Loss) situation if there was multiple disk groups experiencing the same issue, or if there was another cause of redundancy loss when the disk group failed before data was rebuilt by vSAN to another disk or disk group. There is no method to recover the disk group intact once the physical disk blocks have failed and led to the unrecovered read error.
Workaround:
As of ESXi/vSAN 6.7 P03 and 7.0 Update 1 and newer a feature called autoDG Creation was introduced. This feature is disabled by default.
Note: as of version 7.0U3c and newer, the feature is enabled by default.
This feature only works with all flash vSAN disk groups. This feature allows the disk group to be recreated after encountering an Unrecovered Read Error/Medium Error aka a URE.
( H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 >> SCSI code for the Medium error ) Note: If your build is below one of the listed versions, please open a case with VMware vSAN Support and we will utilize a manual process to provide the same end result.
Once it is turned on, TRIM is performed automatically over the metadata/dedupe metadata region of the disk (configurable to the full disk) so that bad blocks no longer participate after the disk group is recreated. This is to ensure that the user does not experience any issues with the newly created disk group.
The automated workflow re-adds the trimmed disk into the disk group. With this feature turned on, this automation removes the need for a customer to manually intervene and re-add the disk to the disk group.
Some important things to keep in mind:
- This feature is available only on “All Flash vSAN” and not “Hybrid vSAN”.
- This is a per-host operation and must be enabled on each host it is needed on an individual basis.
- If there are multiple disks with this issue in a disk group, we only TRIM one disk at a time sequentially not all disks in parallel.
- It is to be noted that a working copy of the data on that disk group is required to be present on a separate host in order for the rebuild operation to succeed after the disk group is recreated. Furthermore, the auto disk group creation follows the vSAN component absent workflow i.e rebuilding of the disk group starts after 60 minutes of wait time to make the cluster compliant again.
- Once enabled these settings are persistent across reboots and upgrades.
To enable, please use the following 3 options on all hosts that need this performed, the third option is only required if the DG creation fails with trim enabled due to
- /LSOM/lsomEnableRebuildOnLSE [Integer] : Enable Auto rebuild of disk/DG on detecting LSE errors. Acceptable values are either 1 or 2. If ESXi was upgraded from 6.7 to 7.0U1 or higher you will see a value of 2.
- /VSAN/TrimDisksBeforeUseGranularity [Integer] : Trim the devices (if supported) before using for vSAN. 0=Disable, 1=MetaData only, 2=Full Disk
- /VSAN/WriteZeroOnTrimUnsupported [Integer] : Enable Writing Zero's on capacity devices that do not support TRIM (Requires TrimDisksBeforeUseGranularity to be enabled)
Current settings may be found in the following manner:
- esxcfg-advcfg -g /LSOM/lsomEnableRebuildOnLSE
- esxcfg-advcfg -g /VSAN/TrimDisksBeforeUseGranularity
- esxcfg-advcfg -g /VSAN/WriteZeroOnTrimUnsupported
These may be enabled in the following manner:
- esxcfg-advcfg -s 1 /LSOM/lsomEnableRebuildOnLSE
- esxcfg-advcfg -s 1 /VSAN/TrimDisksBeforeUseGranularity
- esxcfg-advcfg -s 1 /VSAN/WriteZeroOnTrimUnsupported
If a disk group has already failed out of use due to a URE, please recreate the disk group after the above options have been enabled, so the disk group can be recreated automatically next time if it fails out of use for a URE in a metadata region in the future.