VMware VSAN [All Versions]
At first there was I/Os timeout, then driver receives DPC error event indicating some PCIe error detected on the device. These events confirm that the underlying device was already in a faulty state at the time the I/O timeouts occurred.
Sequence of events:
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu1:2098056)NvmeDeviceIO: 1865: Start TSC for CmdSN 402a6fc9 is 6827793373 ms
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu1:2098056)NVMEPSA:1345 taskMgmt:abort cmdId.initiator=0x430bb43d8700 CmdSN 0x402a6fc9 world:0 controller 260 state:5 nsid:1
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu1:2098056)NVMEIO:3974 Ctlr 260, ns 1, tmReq 0x431e83b73e80, type 1, initiator 0x430bb43d8700, sn 0x402a6fc9, world id 0.
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu10:2098203)NVMEIO:4654 ctlr 260, queue 1, cid 870, cap 0x3, count 0, found cmd 0x45c39a221e00 (initiator 0x430bb43d8700, serialNumber 0x402a6fc9, worldID 0)
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu10:2098203)NVMEIO:4770 Issuing command to cancel cmd 0x45c39a221e00 (tag 0x0) on queue 1, tracker 0x431e83b123c0, cid 870
2025-10-20T07:04:35.140Z In(182) vmkernel: cpu10:2098203)NVMEIO:4776 cmd2Abort 0x45c39a221e00, opcode 0x2, nsid 1, lba 1653849216, lbc 127
2025-10-20T07:04:37.140Z In(182) vmkernel: cpu5:2098056)StorageDeviceIO: 5697: FDS_DEV_EVENT_REPORT_STUCK_IO event for device t10.NVMe____Dell_Express_Flash_NVMe_P4510_4TB_SFF___################
2025-10-20T07:04:37.140Z Wa(180) vmkwarning: cpu106:2098027)WARNING: PLOG: PLOG_DeviceHandleIOTimeOut:8792: vSAN device 525ca049-####-####-####-190b6fa0656a detected I/O timeout error. This may lead to stuck I/O.
2025-10-20T07:04:41.143Z In(182) vmkernel: cpu0:2098205)NVMEDEV:8245 Resetting controller 260 (nqn.2014-08.org.nvmexpress_8086_Dell_Express_Flash_NVMe_P4510_4TB_SFF___################)
2025-10-20T07:04:41.181Z Wa(180) vmkwarning: cpu1:2098205)WARNING: NVMEDEV:8343 Failed to enable controller 260, status: Device is permanently unavailable
2025-10-20T07:04:41.181Z In(182) vmkernel: cpu80:2098336)NvmeAdapter: 3051: Unregistering adapter vmhba62025-10-20T07:04:41.182Z In(182) vmkernel: cpu80:2098336)StoragePsaDriver: 634: device 0x7b46430e74210533 Detach complete [status=Success]2025-10-20T07:04:41.182Z In(182) vmkernel: cpu80:2098336)Device: 412: storage_psa:driver->ops.detachDevice:0 ms2025-10-20T07:04:41.182Z In(182) vmkernel: cpu80:2098336)Device: 1721: Unregistered device: 0x430e74201220 logical#pci#p0000:c6:00.0#0#0 com.vmware.StorHBAPort2025-10-20T07:04:41.182Z Wa(180) vmkwarning: cpu80:2098336)WARNING: NvmeAdapter: 3155: Releasing adapter vmhba6...2025-10-20T07:04:41.182Z Wa(180) vmkwarning: cpu80:2098336)WARNING: NVMEDEV:3236 Failed to read controller csts register, status: Device is permanently unavailable...2025-10-20T07:04:41.182Z In(182) vmkernel: cpu80:2098336)nvme_pcie001980000:RemoveDevice:232:Device 0x62d8430e7420f3f3 removed....2025-10-20T07:04:41.371Z In(182) vmkernel: cpu3:2098030)HPP: HppPathGroupMovePath:688: Path "vmhba6:C0:T0:L0" state changed from "active" to "dead"
2025-10-20T07:04:41.371Z In(182) vmkernel: cpu3:2098030)PLOG: PLOGLogDiskEvent:4135: Disk Event unplug for MD 525ca049-####-####-####-190b6fa0656a (t10.NVMe____Dell_Express_Flash_NVMe_P4510_4TB_SFF___000194F1BFE4D25C:2)...2025-10-20T07:04:41.381Z Wa(180) vmkwarning: cpu44:2099494)WARNING: LSOM: LSOMEventNotify:9026: vSAN device 525ca049-####-####-####-190b6fa0656a has gone offline.
2025-10-20T07:04:41.179Z In(182) vmkernel: cpu1:2097691)PCIEDPC: 1220: 0000:c0:03.3: Port experienced DPC, reason RP PIO error2025-10-20T07:04:41.179Z In(182) vmkernel: cpu1:2097691)PCIEErrRecov: 194: 0000:c0:03.3: Request made to remove device 0000:c6:00.0 from device layer...2025-10-20T07:04:41.179Z In(182) vmkernel: cpu80:2098336)nvme_pcie:ForgetDevice:411:Called with 0x6225430e7420c5b2.2025-10-20T07:04:41.179Z In(182) vmkernel: cpu80:2098336)nvme_pcie001980000:ForgetDevice:419:Device 0x6225430e7420c5b2 forgotten.
Engage hardware vendor to asses the health of the NVMe disk reporting DPC error