vSAN NVMe Controller Recovery Failure NVMEDEV
search cancel

vSAN NVMe Controller Recovery Failure NVMEDEV

book

Article ID: 435534

calendar_today

Updated On:

Products

VMware vSAN VMware Cloud Foundation

Issue/Introduction


the ESXi host vmkernel logs report the following hardware-level NVMe controller error for the isolated device: NVMEDEV:9460 Controller <ID> failed to recover after <X> attempts

Environment

vSAN 8.0 U3 or later.
VMware ESXi 8.0 U3 or later
Dell PowerEdge R640 (or similar 14G/15G servers)
Dell Express Flash NVMe P4510

Cause

The NVMe device experiences a hardware or communication fault that causes the host storage stack to lose connectivity. This prompts vSAN to immediately unmount the disk and mark it as APD to preserve cluster stability.


While vSAN has already isolated the device, the underlying hardware NVMe controller independently continues its internal recovery routines and logs the "failed to recover after <X> attempts" message.
Example: 

2026-03-12T21:41:36.976Z Wa(180) vmkwarning: cpu60:2098097)WARNING: NVMEDEV:2347 Ctlr 263, failed to call setNumberIOQueues, status: Failure
2026-03-12T21:41:36.976Z In(182) vmkernel: cpu37:2097705)NvmeUtil: 502: Transient status for command 0x6 set to VMK_STORAGE_RETRY_OPERATION because of an abort/reset before the command timed out: cmdId.initiator=0xXXXXXXXXXXXX cmdId.serialNumber=0x0)
2026-03-12T21:41:36.976Z Wa(180) vmkwarning: cpu60:2098097)WARNING: NVMEDEV:8416 Failed to configure IO queues for controller 263, status: Failure
2026-03-12T21:41:36.976Z Wa(180) vmkwarning: cpu60:2098097)WARNING: NVMEDEV:9698 Controller 263 recovery already active.



vSAN Dying Disk Handling (DDH) feature ensures the disk remains permanently unmounted, preventing a destabilizing flapping state, regardless of the NVMe controller's internal recovery attempts or status.

Resolution

 

  • Identify the isolated NVMe disk associated with the APD state and controller recovery failures in the host logs.

  • Replace the faulty NVMe hardware device following standard host maintenance and disk replacement procedures from your hardware vendor

 

Additional Information

No additional configuration changes are required for the vSAN cluster to handle this specific hardware failure. The vSAN Dying Disk Handling (DDH) automatically manages the permanent isolation of the degraded device once the initial APD event occurs.

Dying Disk Handling (DDH) in vSAN provides information on DDH