After updating the BIOS and firmware, NVMe drives started experiencing PDL issues.
vSAN ESA cluster shows warning on vCenter as: "One of the disks is detected with PDL in vSAN ESA Cluster. Please check the host for further details"
Module VM power on failed"VMware vSAN 8.0.3
The issue seen is when the hardware devices show up active initially and would report PDL state upon completing the reboot. Refer the below snippet.
Reference:
From ESXi, var/run/log/vmkernel.log:
2025-05-16T03:15:57.372Z Wa(180) vmkwarning: cpu23:2097647)WARNING: HPP: HppDeviceUpdateState:5269: Device 't10.NVMe____MZXLR7T6HALA2D000H3______________________########' is changing to 'APD' from 'permanent device loss'.2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: NvmeDeviceIO: 1725: Command 0x9 to device "t10.NVMe____MZXLR7T6HALA2D000H3______________________########" marked for PDL virtual reset completed with abort/reset: cmdId2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: initiator=0x4309e6c2ec40 cmdId.serialNumber=0x2be7)2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: NvmeUtil: 151: Error on Cmd(0x45bf0d66cd40) 0x9, CmdSN 0x2be7 from world 0 to component "t10.NVMe____MZXLR7T6HALA2D000H3______________________########" H:0xe D:0x0 P:0x02025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: vmkio_unmap:1334: GOTO_ON_ERROR [195887410/0xbad0132/Device is permanently unavailable]2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: BAUMIssueUnmap:1227: BAUM: Unmap failed2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: BAUMIssueUnmap:1227: GOTO_ON_ERROR [195887410/0xbad0132/Device is permanently unavailable]2025-05-16T03:15:58.372Z Wa(180) vmkwarning: cpu23:2097647)WARNING: HPP: HppDeviceUpdateState:5269: Device 't10.NVMe____MZXLR7T6HALA2D000H3______________________########' is changing to 'APD' from 'permanent device loss'.2025-05-16T03:15:58.372Z Wa(180) vmkwarning: cpu53:2099265)WARNING: HPP: HppAttemptFailoverRequest:1059: Retry world restore device "t10.NVMe____MZXLR7T6HALA2D000H3______________________########" - no more commands to retryFrom ESXI, var/run/log/vsandevicemonitord.log:
2025-09-25T06:50:21Z In(14) vsandevicemonitord[2100512]: [938863764160] : Device t10.NVMe____Micron_7450_MTFDKCC6T4TFS__________############### state is DISK_UNDER_STUCK IO2025-09-25T07:00:21Z In(14) vsandevicemonitord[2100512]: [938863764160] : Device t10.NVMe____Micron_7450_MTFDKCC6T4TFS__________############### state is DISK_UNDER_STUCK IO
Please involve hardware vendor to assist with the hardware issue.