vSAN Disk Or Diskgroup Fails With Medium Errors
search cancel

vSAN Disk Or Diskgroup Fails With Medium Errors

book

Article ID: 326767

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This KB is intended to assist in identifying if a vSAN disk or disk group has failed due to medium errors detected within the metadata or dedupe metadata region of the disk, and assist in resolving future occurrences. 


Symptoms:
You see the following messages in vmkernel.log

2020-08-12T13:16:13.170Z cpu1:1000341424)ScsiDeviceIO: SCSICompleteDeviceCommand:4267: Cmd(0x45490152eb40) 0x28, CmdSN 0x11 from world 0 to dev "mpx.vmhba0:C0:T3:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x10 0x0
2020-08-12T13:16:13.170Z cpu1:1000341424)LSOMCommon: IORETRY_handleCompletionOnError:1723: Throttled:  0x454bc05ff900 IO type 264 (READ) isOrdered:NO isSplit:NO isEncr:NO since 0 msec status Read error

2020-08-21T06:57:24.333Z cpu0:1000341425)ScsiDeviceIO: SCSICompleteDeviceCommand:4267: Cmd(0x4549015c8840) 0x2a, CmdSN 0x6 from world 0 to dev "mpx.vmhba0:C0:T3:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x3 0x0
2020-08-21T06:57:24.333Z cpu0:1000341425)LSOMCommon: IORETRYCompleteIO:470: Throttled:  0x454beebff940 IO type 304 (WRITE) isOrdered:NO isSplit:YES isEncr:NO since 0 msec status Write error

2019-11-03T11:16:06.462Z cpu56:66446)NMP: nmp_ThrottleLogForDevice:3616: Cmd 0x28 (0x439dc176a8c0, 0) to dev "mpx.vmhba0:C2:T1:L0" on path "vmhba0:C2:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0. Act:NONE
2019-11-03T11:16:06.462Z cpu56:66446)ScsiDeviceIO: 3015: Cmd(0x439dc176a8c0) 0x28, CmdSN 0x19b2 from world 0 to dev "mpx.vmhba0:C2:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x0.
2019-11-03T11:16:06.462Z cpu56:66446)LSOMCommon: IORETRY_handleCompletionOnError:2461: Throttled:  0x439862b7e640 IO type 264 (READ) isOrdered:NO isSplit:YES isEncr:YES since 23 msec status I/O error

These may be followed by the following:
2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateErrorInt:2821: Permanent error event on 5290f0b2-e0fb-1e42-b57e-c379e3f95b18

And if deduplication and compression are enabled with these similar messages as well:
2019-11-03T11:16:06.462Z cpu3:67299)WARNING: PLOG: DDPCompleteDDPWrite:2992: Throttled: DDP write failed I/O error callback [email protected]#0.0.0.1
2019-11-03T11:16:06.462Z cpu3:67299)WARNING: PLOG: PLOGDDPCallbackFn:234: Throttled: DDP write failed I/O error
2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateError:2880: DDP: Propagating error state from original device mpx.vmhba0:C2:T2:L0:2
2019-11-03T11:16:06.462Z cpu46:66973)WARNING: PLOG: PLOGPropagateError:2921: DDP: Propagating error state to MDs in device naa.5000cca09b0136c4:2
2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMEventNotify:6734: Throttled: Event 2: waiting for mount helper for disk 5290f0b2-e0fb-1e42-b57e-c379e3f95b18
2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMLogDiskEvent:5759: Disk Event permanent error propagated for MD 5297464a-c298-cdea-405c-8f0529f1d291 (mpx.vmhba0:C2:T1:L0:2)
2019-11-03T11:16:06.462Z cpu13:11600681)WARNING: LSOM: LSOMEventNotify:6886: Virtual SAN device 5297464a-c298-cdea-405c-8f0529f1d291 is under propagated permanent error.
2019-11-03T11:16:06.462Z cpu13:11600681)LSOM: LSOMLogDiskEvent:5759: Disk Event permanent error propagated for SSD 5293a19c-30c9-04f5-0b0e-d6ab7b4a3e0e (naa.5000cca09b0136c4:2)
2019-11-03T11:16:06.462Z cpu13:11600681)WARNING: LSOM: LSOMEventNotify:6886: Virtual SAN device 5293a19c-30c9-04f5-0b0e-d6ab7b4a3e0e is under propagated permanent error.

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSAN 6.x
VMware vSAN 7.0.x

Cause

When a block is unable to be read in the vSAN metadata or vSAN dedupe metadata region of a disk (1st 5-10% of a disk) the disk will be marked by vSAN as having a permanent error. This is because without this data we are unable to properly track where data resides on disk or deduplicated objects may actually reside within the disk group. 

If there are no changes to the size of the disk group, the same areas of the disk are used for the metadata, meaning that upon host reboot or disk group recreation, the bad block(s) is still in the metadata region of the disk, and the failure will recur until the bad block or blocks have been remapped by the disk firmware.

Resolution


There is no method to prevent the logical(physical) failure of a disks blocks as SSDs degrade overtime, therefore when a failure to read is experienced, in the metadata or dedupe metadata region vSAN fails out the disk or disk group if dedupe is enabled.

Workaround:
As of  ESXi/vSAN 6.7 P03 and 7.0 Update 1 and newer a feature called autoDG Creation was introduced. This feature is disabled by default.

Note: as of version 7.0U3c and newer, the feature is enabled by default.

This feature only works with all flash vSAN disk groups. This feature allows the disk group to be recreated after encountering an Unrecovered Read Error/Medium Error aka a URE. 

Note: If your build is below one of the listed versions, please open a case with VMware vSAN Support and we will utilize a manual process to provide the same end result.

Once it is turned on, TRIM is performed automatically over the metadata/dedupe metadata region of the disk (configurable to the full disk) so that bad blocks no longer participate after the disk group is recreated. This is to ensure that the user does not experience any issues with the newly created disk group. 

The automated workflow re-adds the TRIMM-ed disk into the disk group.  With this feature turned on, this automation removes the need for a customer to manually intervene and re-add the disk to the disk group.
 
Some important things to keep in mind:
  1. This feature is available only on “All Flash vSAN” and not “Hybrid vSAN”.
  2. This is a per-host operation and must be enabled on each host it is needed on an individual basis.
  3. If there are multiple disks with this issue in a disk group, we only TRIM one disk at a time sequentially not all disks in parallel.
  4. It is to be noted that a working copy of the data on that disk group is required to be present on a separate host in order for the rebuild operation to succeed after the diskgroup is recreated.  Furthermore, the auto disk group creation follows the vSAN component absent workflow i.e rebuilding of the disk group starts after 60 minutes of wait time to make the cluster compliant again.
  5. Once enabled these settings are persistent across reboots and upgrades.
To enable, please use the following 3 options on all hosts that need this performed, the third option is only required if the DG creation fails with trim enabled due to 
  1. 1. /LSOM/lsomEnableRebuildOnLSE [Integer]: Enable Auto rebuild of disk/DG on detecting LSE errors. Acceptable values are either 1 or 2.  If ESXi was upgraded from 6.7 to 7.0U1 or higher you will see a value of 2.
  2. 2. /VSAN/TrimDisksBeforeUseGranularity [Integer]: Trim the devices (if supported) before using for vSAN. 0=Disable, 1=MetaData only, 2=Full Disk
  3. 3. /VSAN/WriteZeroOnTrimUnsupported [Integer]: Enable Writing Zero's on capacity devices that do not support TRIM (Requires TrimDisksBeforeUseGranularity to be enabled)
Current settings may be found in the following manner:
  1. esxcfg-advcfg -g /LSOM/lsomEnableRebuildOnLSE
  2. esxcfg-advcfg -g /VSAN/TrimDisksBeforeUseGranularity
  3. esxcfg-advcfg -g /VSAN/WriteZeroOnTrimUnsupported
These may be enabled in the following manner:
  1. esxcfg-advcfg -s 1 /LSOM/lsomEnableRebuildOnLSE
  2. esxcfg-advcfg -s 1 /VSAN/TrimDisksBeforeUseGranularity
  3. esxcfg-advcfg -s 1 /VSAN/WriteZeroOnTrimUnsupported
If a disk group has already failed out of use due to a URE, please recreate the disk group after the above options have been enabled, so the disk group can be recreated automatically next time if it fails out of use for a URE in a metadata region in the future.

Additional Information

Impact/Risks:
The process presented below has no additional impacts or risks. The underlying problem could result in a DU or DL situation if there was multiple disk groups experiencing the same issue, or if there was another cause of redundancy loss when the disk group failed before data was rebuilt by vSAN to another disk or disk group. There is no method to recover the disk group intact once the physical disk blocks have failed and led to the unrecovered read error.