Warning: 'One of the disks is detected with PDL in vSAN ESA Cluster. Please check the host for further details' on vSAN ESA cluster after updating the BIOS and Firmware on server.
search cancel

Warning: 'One of the disks is detected with PDL in vSAN ESA Cluster. Please check the host for further details' on vSAN ESA cluster after updating the BIOS and Firmware on server.

book

Article ID: 398881

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • After updating the BIOS and firmware, NVMe drives started experiencing PDL issues.

  • vSAN ESA cluster shows warning on vCenter as: "One of the disks is detected with PDL in vSAN ESA Cluster. Please check the host for further details"

  • vSAN objects to go inaccessible and cause VMs to crash.

  • Unable to perform the vmotion of VM getting an error as "Module VM power on failed"

Environment

VMware vSAN 8.0.3

Cause

The issue seen is when the hardware devices show up active initially and would report PDL state upon completing the reboot. Refer the below snippet.

Reference:
From ESXi, var/run/log/vmkernel.log:

2025-05-16T03:15:57.372Z Wa(180) vmkwarning: cpu23:2097647)WARNING: HPP: HppDeviceUpdateState:5269: Device 't10.NVMe____MZXLR7T6HALA2D000H3______________________########' is changing to 'APD' from 'permanent device loss'.
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: NvmeDeviceIO: 1725: Command 0x9 to device "t10.NVMe____MZXLR7T6HALA2D000H3______________________########" marked for PDL virtual reset completed with  abort/reset: cmdId
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: initiator=0x4309e6c2ec40 cmdId.serialNumber=0x2be7)
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu11:2097644)WARNING: NvmeUtil: 151: Error on Cmd(0x45bf0d66cd40) 0x9, CmdSN 0x2be7 from world 0 to component "t10.NVMe____MZXLR7T6HALA2D000H3______________________########"  H:0xe D:0x0 P:0x0
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: vmkio_unmap:1334: GOTO_ON_ERROR [195887410/0xbad0132/Device is permanently unavailable]
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: BAUMIssueUnmap:1227: BAUM: Unmap failed
2025-05-16T03:15:57.460Z Wa(180) vmkwarning: cpu2:2100570)WARNING: WOBTREE: BAUMIssueUnmap:1227: GOTO_ON_ERROR [195887410/0xbad0132/Device is permanently unavailable]
2025-05-16T03:15:58.372Z Wa(180) vmkwarning: cpu23:2097647)WARNING: HPP: HppDeviceUpdateState:5269: Device 't10.NVMe____MZXLR7T6HALA2D000H3______________________########' is changing to 'APD' from 'permanent device loss'.
2025-05-16T03:15:58.372Z Wa(180) vmkwarning: cpu53:2099265)WARNING: HPP: HppAttemptFailoverRequest:1059: Retry world restore device "t10.NVMe____MZXLR7T6HALA2D000H3______________________########" - no more commands to retry

From ESXI, var/run/log/vsandevicemonitord.log:

2025-09-25T06:50:21Z In(14) vsandevicemonitord[2100512]: [938863764160] : Device t10.NVMe____Micron_7450_MTFDKCC6T4TFS__________############### state is DISK_UNDER_STUCK IO

2025-09-25T07:00:21Z In(14) vsandevicemonitord[2100512]: [938863764160] : Device t10.NVMe____Micron_7450_MTFDKCC6T4TFS__________############### state is DISK_UNDER_STUCK IO

 

Resolution

Please involve hardware vendor to assist with the hardware issue.