Repeated All-Paths-Down (APD) errors on Micron NVME drives
search cancel

Repeated All-Paths-Down (APD) errors on Micron NVME drives

book

Article ID: 412411

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0

Issue/Introduction

Symptoms: 

  • You have NVMe drives installed locally in the ESXi host.
  • The hosts are reporting "All-Paths-Down" status for the NVMe drive(s):

2025-09-25T21:27:01.197Z In(14) vobd[2099683]:  [vmfsCorrelator] 6927271523835us: [vob.vmfs.heartbeat.timedout] 68681008-ed818838-caad-1423f2c0b230 LOCAL_DATASTORE01
2025-09-25T21:27:01.197Z In(14) vobd[2099683]:  [vmfsCorrelator] 6927390774021us: [esx.problem.vmfs.heartbeat.timedout] 68681008-ed818838-caad-1423f2c0b230 LOCAL_DATASTORE01
2025-09-25T21:27:41.583Z In(14) vobd[2099683]:  [APDCorrelator] 6927311909286us: [vob.storage.apd.start] Device or filesystem with identifier [t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY] has entered the All Paths Down state.
2025-09-25T21:27:41.583Z In(14) vobd[2099683]:  [psastorCorrelator] 6927311909218us: [vob.psastor.psastorpath.pathstate.dead] storagePath vmhba1:C0:T0:L0 changed state from on (device ID: t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY)
2025-09-25T21:27:41.583Z In(14) vobd[2099683]:  [APDCorrelator] 6927431159954us: [esx.problem.storage.apd.start] Device or filesystem with identifier [t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY] has entered the All Paths Down state.
2025-09-25T21:27:41.592Z In(14) vobd[2099683]:  [psastorCorrelator] 6927431168453us: [esx.problem.storage.connectivity.lost] Lost connectivity to storage device t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY. Path vmhba1:C0:T0:L0 is down. Affected datastores: "LOCAL_DATASTORE01".
2025-09-25T21:30:01.587Z In(14) vobd[2099683]:  [APDCorrelator] 6927451910165us: [vob.storage.apd.timeout] Device or filesystem with identifier [t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2025-09-25T21:30:01.587Z In(14) vobd[2099683]:  [APDCorrelator] 6927571163551us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

 

  • You see entries in the vmkernel.log similar to the following showing a 0x808 status similar to the following:: 
2025-09-25T20:58:13.167Z Wa(180) vmkwarning: cpu298:2100026)WARNING: HPP: HppNvmeThrottleLogForDevice:600: NVMe Cmd 0x2 (0x45db81be0cc0, 4358304) to dev "t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY" on path "vmhba1:C0:T0:L0" Failed:
2025-09-25T20:58:13.167Z Wa(180) vmkwarning: cpu298:2100026)WARNING: HPP: HppNvmeThrottleLogForDevice:608: Error status H:0x9 D:0x0 P:0x0 hppAction = 1
2025-09-25T20:58:13.167Z Wa(180) vmkwarning: cpu298:2100026)WARNING: NVMEPSA:217 Complete vmkNvmeCmd: 0x45bb8f7a90c0, vmkPsaCmd: 0x45db81be0cc0, cmdId.initiator=0x4311c8c17d80, CmdSN: 0x120, status: 0x808
 
  • You may also see" Virtual resets" being issued to the NVMe device and command tasks being aborted in conjunction with the above errors:
2025-09-25T21:27:01.197Z In(182) vmkernel: cpu3:2100134)NvmeDeviceIO: 3315: Virt reset issued on device t10.NVMe____XXXXXXXXXXXXXXXXXXXXXXXXX________________YYYYYYYYYYYYYYYYY
2025-09-25T21:27:01.197Z In(182) vmkernel: cpu103:2099594)NVMEPSA:1345 taskMgmt:abort cmdId.initiator=0x430c72ac9f00 CmdSN 0x557fe25 world:2097544 controller 261 state:5 nsid:1
2025-09-25T21:27:01.197Z In(182) vmkernel: cpu103:2099594)NVMEIO:3974 Ctlr 261, ns 1, tmReq 0x431f44f2c320, type 1, initiator 0x430c72ac9f00, sn 0x557fe25, world id 2097544.
2025-09-25T21:27:01.197Z In(182) vmkernel: cpu103:2099594)NvmeUtil: 470: Transient status for command 0x5 set to VMK_ABORTED because retries were inhibited (e.g., by a guest virtual device reset): cmdId.initiator=0x430c72ac9f00 cmdId.serialNumber=0x557fe25)
 
  • The I/O being issued to the datastore is minimal.

Environment

  • VMware ESXi 8.x

Cause

  • The "status: 0x808" in the above messages indicates that the drives are busy, thus they are unable to respond to storage commands, so those commands are ultimately timing out.
  • This results in guests issuing SCSI resets on the device(s) from the guest layer. to clear any pending reservations, clear the pending I/O, etc., to  attempt to return the device to an initial power-on state.
  • This is a 3rd party issue.

Resolution

  • Please work with your hardware vendor to investigate the cause of the drive busyness. 

Additional Information