"vdq -iH", throws an error suggesting a device cannot be opened.[root@esx-0l :~ ] vdq -iHVsanUtil: : ReadFromDevice: Failed to open , errno (2)VsanUtil: : GetVsanStoragePoolDisks: Error occurred 'Failed to open device ', create disk with null idSingleTierDisks: "singleTier" : [ "eui.################1###############", "eui.################2###############", "eui.################3###############", "eui.################4###############", "eui.################5###############", "eui.################6###############", "eui.################7###############", "eui.################8###############", "eui.################9###############", "eui.################10##############", "eui.################11##############", "eui.################12##############", "eui.################13##############", "eui.################14##############", "eui.################15##############", "eui.################16##############" ]
[root@esx-0l :~ ] esxcli vsan storagepool list | grep -i cmmdsIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: trueIn CMMDS: falseVMware vSphere vSAN 8.x
IO is stuck outside of ESXi (controller/firmware) and does not complete or respond to abort request. If the device/controller doesn’t respond to the abort within 120 seconds (default timeout) vSAN will take the disk/Disk Group to offline state to avoid affecting the entire vSAN cluster.
"/var/run/log/vsanmgmt.log", we see below events -YYYY-MM-DDThh:mm:ss.msZ In(14) vsand[2101102]: [opID=23325f16-8602 VsanLsomHealth::checkDiskState] Got devResState from devsTelemetry for disk 527be2c9-####-####-####-f3f9b5467257: DISK_UNDER_STUCK_IO
YYYY-MM-DDThh:mm:ss.msZ In(14) vsand[2101102]: [opID=23325f16-8602 VsanHealthSystemImpl::_QueryPhysicalDiskHealthSummary] Disk 527be2c9-####-####-####-f3f9b5467257 cmmds health status: {'healthFlags': 0, 'timestamp': 42578920677, 'healthReason': 0} , LSOM telemetry status: STUCK_IO_ERROR
"/var/run/log/vobd.log", we see below events -YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672996094150us: [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba0:C0:T2:L0 changed state from on (device ID: eui.################17##############)YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672996094842us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device eui.################17##############. Path vmhba0:C0:T2:L0 is down. Affected datastores: Unknown.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672996094185us: [vob.scsi.device.state.permanentloss] Device :eui.################17############## has been removed or is permanently inaccessible.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672996094972us: [esx.problem.scsi.device.state.permanentloss] Device: eui.################17############## has been removed or is permanently inaccessible. Affected datastores (if any): Unknown.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[vSANCorrelator] 2672996104426us: [vob.vsan.pdl.offline] vSAN device 527be2c9-####-####-####-f3f9b5467257 has gone offline.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[vSANCorrelator] 2672996104441us: [esx.problem.vob.vsan.pdl.offline] vSAN device 527be2c9-####-####-####-f3f9b5467257 has gone offline.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672997219440us: [vob.scsi.device.state.permanentloss.noopens] Permanently inaccessible device :eui.################17############## has no more open connections. It is now safe to unmount datastores (if any) and delete the device.YYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672997219412us: [esx.problem.scsi.device.state.permanentloss.noopens] Permanently inaccessible device: eui.################17############## has no more opens. It is now safe to unmount datastores (if any): Unknown and delete the deviceYYYY-MM-DDThh:mm:ss.msZ In(14) vobd[524562]:[scsiCorrelator] 2672997219912us: [vob.scsi.scsipath.remove] Remove path: vmhba0:C0:T2:L0
2025-06-26T11:17:07.965Z In(182) vmkernel: cpu71:2255837)WOBTREE: IOLayer_SetDeviceOffline:8596: OfflineDevice t10.NVMe____INTEL_##############____________________################:2 status=Maximum kernel-level retries exceeded errType TRANSIENT
2025-06-26T11:17:16.082Z In(182) vmkernel: cpu112:2286648)StorageDeviceIO: 5697: FDS_DEV_EVENT_REPORT_STUCK_IO event for device t10.NVMe____INTEL_##############____________________################