ESXi hosts running on storage controllers utilizing the Marvell 88SE9230 AHCI chipset, such as the Cisco Boot Optimized M.2 RAID Controller, Dell BOSS S1, or Lenovo ThinkSystem M.2 adapters, may lose connectivity to the boot volume. This failure results in an All Paths Down (APD) state for the local storage, causing management agents (hostd/vpxa) to become unresponsive and the host to show as disconnected in vCenter. High I/O loads or RAID-1 configurations on these specific AHCI controllers trigger a hardware deadlock that prevents the driver from recovering the PCIe link.
Symptoms:
Alt+F12), you may observe the following error (the value may vary).IssueCommand:ERROR Tag 1 SActive already set: SACI:3E CI:3E activeTags:0 reissue_flag:0<YYYY-MM-DD>T<HH:MM:SS> cpu39:#######)HPP: HppAttemptFailoverRequest:1391: Re-issuing first command for HPP device "t10.ATA_____ThinkSystem_M.2_VD______________________########################" (NO_CONNECT_ON_APD = CLEAR)WARNING: vmw_ahci[####]:<0] IssueCommand:ERROR: Tag 1 SActive already set: SACT:ffffffff CI:ffffffff activeTags:0 reissue_flag:0error vpxd cannot contact the specified host (xxxxxx)ALERT: Bootbank cannot be found at path '/bootbank'WARNING: HPP: HppAttemptFailoverRequest:####: Re-issuing first command for HPP device "t10.ATA_____CISCO_VD________________________________####Hardware communication failure occurs when the Marvell 88SE9230 AHCI controller encounters a PCIe bus Master Abort or deadlock. The controller fails to follow AHCI specifications during port resets, leaving status registers in an inconsistent state (indicated by SACT:ffffffff) that prevents the vmw_ahci driver from recovering the device.