vSAN Disk Or Diskgroup Fails With Medium Errors

Products

VMware vSAN

Issue/Introduction

This KB is intended to assist in identifying if a vSAN disk or disk group has failed due to medium errors detected within the metadata or dedupe metadata region of the disk, and assist in resolving future occurrences.

Impact/Risks:

The process presented below has no additional impacts or risks. The underlying problem could result in a DU or DL situation if there was multiple disk groups experiencing the same issue, or if there was another cause of redundancy loss when the disk group failed before data was rebuilt by vSAN to another disk or disk group. There is no method to recover the disk group intact once the physical disk blocks have failed and led to the unrecovered read error.

Symptoms:

You see the following messages in vmkernel.log

2020-08-12T13:16:13.170Z cpu1:1000341424)ScsiDeviceIO: SCSICompleteDeviceCommand:4267: Cmd(0x45490152eb40) 0x28, CmdSN 0x11 from world 0 to dev "mpx.vmhba0:C0:T3:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x10 0x0

2020-08-12T13:16:13.170Z cpu1:1000341424)LSOMCommon: IORETRY_handleCompletionOnError:1723: Throttled:  0x454bc05ff900 IO type 264 (READ) isOrdered:NO isSplit:NO isEncr:NO since 0 msec status Read error

2020-08-21T06:57:24.333Z cpu0:1000341425)ScsiDeviceIO: SCSICompleteDeviceCommand:4267: Cmd(0x4549015c8840) 0x2a, CmdSN 0x6 from world 0 to dev "mpx.vmhba0:C0:T3:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x3 0x0

2020-08-21T06:57:24.333Z cpu0:1000341425)LSOMCommon: IORETRYCompleteIO:470: Throttled:  0x454beebff940 IO type 304 (WRITE) isOrdered:NO isSplit:YES isEncr:NO since 0 msec status Write error

2019-11-03T11:16:06.462Z cpu56:66446)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x28 (0x439dc176a8c0, 0) to dev "mpx.vmhba0:C2:T1:L0" on path "vmhba0:C2:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0. Act:NONE

2019-11-03T11:16:06.462Z cpu56:66446)ScsiDeviceIO: 3015: Cmd(0x439dc176a8c0) 0x28, CmdSN 0x19b2 from world 0 to dev "mpx.vmhba0:C2:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0.

2019-11-03T11:16:06.462Z cpu56:66446)LSOMCommon: IORETRY_handleCompletionOnError:2461: Throttled:  0x439862b7e640 IO type 264 (READ) isOrdered:NO isSplit:YES isEncr:YES since 23 msec status I/O error

2025-04-19T16:42:45.526Z cpu4:2106057 opID=bc9005a)Partition: 433: Failed read for "naa.################": I/O error

2025-04-19T16:42:45.526Z cpu4:2106057 opID=bc9005a)Partition: 1109: Failed to read protective mbr on "naa.################" : I/O error

2025-04-19T16:42:45.526Z cpu4:2106057 opID=bc9005a)WARNING: Partition: 1262: Partition table read from device naa.################ failed: I/O error

2025-04-19T16:42:45.526Z cpu47:2099121)ScsiDeviceIO: 4062: Cmd(0x45a1d8a06580) 0x28, CmdSN 0x1 from world 2106019 to dev "naa.################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x31 0x0.

These may be followed by the following:

2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateErrorInt:2821: Permanent error event on ########-####-####-####-########5b18

And if deduplication and compression are enabled with these similar messages as well:

2019-11-03T11:16:06.462Z cpu3:67299)WARNING: PLOG: DDPCompleteDDPWrite:2992: Throttled: DDP write failed I/O error callback PLOGDDPCallbackFn@com.vmware.plog#0.0.0.1

2019-11-03T11:16:06.462Z cpu3:67299)WARNING: PLOG: PLOGDDPCallbackFn:234: Throttled: DDP write failed I/O error

2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateError:2880: DDP: Propagating error state from original device mpx.vmhba0:C2:T2:L0:2

2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateError:2921: DDP: Propagating error state to MDs in device naa.5000cca09b0136c4:2

2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMEventNotify:6734: Throttled: Event 2: waiting for mount helper for disk ########-####-####-####-########5b18

2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMLogDiskEvent:5759: Disk Event permanent error propagated for MD ########-####-####-####-########d291 (mpx.vmhba0:C2:T1:L0:2)

2019-11-03T11:16:06.462Z cpu13:11600681)WARNING: LSOM: LSOMEventNotify:6886: Virtual SAN device ########-####-####-####-########d291 is under propagated permanent error.

2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMLogDiskEvent:5759: Disk Event permanent error propagated for SSD ########-####-####-####-########3e0e (naa.5000cca09b0136c4:2)

2019-11-03T11:16:06.462Z cpu13:11600681)WARNING: LSOM: LSOMEventNotify:6886: Virtual SAN device ########-####-####-####-########3e0e is under propagated permanent error.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

The different types of medium errors mentioned above

0x3 0x3 0x0 - PERIPHERAL DEVICE WRITE FAULT

0x3 0x10 0x0 - ID CRC OR ECC ERROR

0x3 0x11 0x0 - Unrecovered read error

0x3 0x31 0x0 - Medium Format corruption

Environment

VMware vSAN (Al Versions)

Cause

When a block is unable to be read in the vSAN metadata or vSAN dedupe metadata region of a disk (1st 5-10% of a disk) the disk will be marked by vSAN as having a permanent error.
This is because without this data we are unable to properly track where data resides on disk or deduplicated objects may actually reside within the disk group.
If there are no changes to the size of the disk group, the same areas of the disk are used for the metadata, meaning that upon host reboot or disk group recreation, the bad block(s) is still in the metadata region of the disk, and the failure will recur until the bad block or blocks have been remapped by the disk firmware.

Resolution

There is no method to prevent the logical(physical) failure of a disks blocks as SSDs degrade overtime, therefore when a failure to read is experienced, in the metadata or dedupe metadata region vSAN fails out the disk or disk group if dedupe is enabled.

Workaround:
As of ESXi/vSAN 6.7 P03 and 7.0 Update 1 and newer a feature called autoDG Creation was introduced. This feature is disabled by default.

Note: as of version 7.0U3c and newer, the feature is enabled by default.

This feature only works with all flash vSAN disk groups. This feature allows the disk group to be recreated after encountering an Unrecovered Read Error/Medium Error aka a URE. ( H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0 >> SCSI code for the Medium error )

Note: If your build is below one of the listed versions, please open a case with VMware vSAN Support and we will utilize a manual process to provide the same end result.

Once it is turned on, TRIM is performed automatically over the metadata/dedupe metadata region of the disk (configurable to the full disk) so that bad blocks no longer participate after the disk group is recreated. This is to ensure that the user does not experience any issues with the newly created disk group.

The automated workflow re-adds the TRIMM-ed disk into the disk group. With this feature turned on, this automation removes the need for a customer to manually intervene and re-add the disk to the disk group.

Some important things to keep in mind:

This feature is available only on “All Flash vSAN” and not “Hybrid vSAN”.
This is a per-host operation and must be enabled on each host it is needed on an individual basis.
If there are multiple disks with this issue in a disk group, we only TRIM one disk at a time sequentially not all disks in parallel.
It is to be noted that a working copy of the data on that disk group is required to be present on a separate host in order for the rebuild operation to succeed after the diskgroup is recreated. Furthermore, the auto disk group creation follows the vSAN component absent workflow i.e rebuilding of the disk group starts after 60 minutes of wait time to make the cluster compliant again.
Once enabled these settings are persistent across reboots and upgrades.

To enable, please use the following 3 options on all hosts that need this performed, the third option is only required if the DG creation fails with trim enabled due to

/LSOM/lsomEnableRebuildOnLSE [Integer] : Enable Auto rebuild of disk/DG on detecting LSE errors. Acceptable values are either 1 or 2. If ESXi was upgraded from 6.7 to 7.0U1 or higher you will see a value of 2.
/VSAN/TrimDisksBeforeUseGranularity [Integer] : Trim the devices (if supported) before using for vSAN. 0=Disable, 1=MetaData only, 2=Full Disk
/VSAN/WriteZeroOnTrimUnsupported [Integer] : Enable Writing Zero's on capacity devices that do not support TRIM (Requires TrimDisksBeforeUseGranularity to be enabled)

Current settings may be found in the following manner:

esxcfg-advcfg -g /LSOM/lsomEnableRebuildOnLSE
esxcfg-advcfg -g /VSAN/TrimDisksBeforeUseGranularity
esxcfg-advcfg -g /VSAN/WriteZeroOnTrimUnsupported

These may be enabled in the following manner:

esxcfg-advcfg -s 1 /LSOM/lsomEnableRebuildOnLSE
esxcfg-advcfg -s 1 /VSAN/TrimDisksBeforeUseGranularity
esxcfg-advcfg -s 1 /VSAN/WriteZeroOnTrimUnsupported

If a disk group has already failed out of use due to a URE, please recreate the disk group after the above options have been enabled, so the disk group can be recreated automatically next time if it fails out of use for a URE in a metadata region in the future.