The vSAN device is going into APD(All paths down) mode randomly and recovering within minutes.
search cancel

The vSAN device is going into APD(All paths down) mode randomly and recovering within minutes.

book

Article ID: 394013

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • The user notices that the vSAN device is going into APD mode randomly. 
  • There are no issues noticed for the concerned drive on the Skyline Health or from respective management UI (iDRAC/iLO).
  • There are no SMART data alerts for the concerned drive as well.
  •  

Environment

VMware vSAN 7.0.x

VMware vSAN 8.0.x

Cause

  • When the user checks the logs (vobd.log), the user may notice the following excerpts. The date, time, and device name will vary depending on the environment.

2025-03-04T01:20:53.015Z In(14) vobd[2097866]:  [APDCorrelator] 7566540858729us: [vob.storage.apd.start] Device or filesystem with identifier [t10.NVMe__xxxxxxxxxx___xxxxxxxxxx] has entered the All Paths Down state.

  • When the user checks the VMkernel logs around the same time when the drive went offline, they expect to see the following.

2025-03-04T01:20:52.808Z Wa(180) vmkwarning: cpu0:2097267)WARNING: HPP: HppNvmeThrottleLogForDevice:599: NVMe Cmd 0x2 (0x4xxxxxxxxx, 0) to dev "t10.NVMe__xxxxxxxxxx___xxxxxxxxxx" on path "vmhba0:C0:T0:L0" Failed:
2025-03-04T01:20:52.808Z Wa(180) vmkwarning: cpu0:2097267)WARNING: HPP: HppNvmeThrottleLogForDevice:607: Error status H:0x7 D:0x0 P:0x0 hppAction = 2

H:0x7 D:0x0 P:0x0 -> This status is returned when a device has been reset due to a Storage Initiator Error. This typically occurs due to an outdated HBA firmware or possibly (though rarely) as the result of a bad HBA.

  • When checking the reason for the HBA failure, the user sees the following excerpts.

2025-03-04T01:20:52.814Z Wa(180) vmkwarning: cpu25:2098164)WARNING: NvmeDiscover: 5149: Mark path vmhba0:C0:T0:L0 as NO_CONNECT
2025-03-04T01:20:52.814Z In(182) vmkernel: cpu25:2098164)NVMEPSA:1622 adpater: vmhba0, action: 1
2025-03-04T01:20:52.814Z In(182) vmkernel: cpu25:2098164)NvmeAdapter: 2991: Unregistering adapter vmhba0
2025-03-04T01:20:52.814Z In(182) vmkernel: cpu25:2098164)StoragePsaDriver: 634: device 0x2xxxxxxxxxx Detach complete [status=Success]

  • Since the HBA is resetting, the backing drive (vSAN cache drive) also fails, and after the reset is complete, the device comes back online again.

Resolution

  • Since the HBA is resetting, this typically occurs due to outdated HBA firmware or possibly (though rarely) as the result of a bad HBA.

  • The user can check if the HBA driver/firmware is compatible and up to date. If the driver/firmware is good, then the user can check for possible hardware issues for the HBA and replace it if applicable.