ESXi host disconnected from vCenter and unresponsive Virtual Machines - Marvell 88SE9230 AHCI
search cancel

ESXi host disconnected from vCenter and unresponsive Virtual Machines - Marvell 88SE9230 AHCI

book

Article ID: 370127

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi hosts running on storage controllers utilizing the Marvell 88SE9230 AHCI chipset, such as the Cisco Boot Optimized M.2 RAID Controller, Dell BOSS S1, or Lenovo ThinkSystem M.2 adapters, may lose connectivity to the boot volume. This failure results in an All Paths Down (APD) state for the local storage, causing management agents (hostd/vpxa) to become unresponsive and the host to show as disconnected in vCenter. High I/O loads or RAID-1 configurations on these specific AHCI controllers trigger a hardware deadlock that prevents the driver from recovering the PCIe link.

Symptoms:

  • ESXi host is disconnected from the vCenter with no heartbeat detected.
  • Virtual Machines cannot be vMotioned
  • On the ESXi DCUI console (via Alt+F12), you may observe the following error (the value may vary).
    • IssueCommand:ERROR Tag 1 SActive already set: SACI:3E CI:3E activeTags:0 reissue_flag:0
    • <YYYY-MM-DD>T<HH:MM:SS> cpu39:#######)HPP: HppAttemptFailoverRequest:1391: Re-issuing first command for HPP device "t10.ATA_____ThinkSystem_M.2_VD______________________########################" (NO_CONNECT_ON_APD = CLEAR)
    • WARNING: vmw_ahci[####]:<0] IssueCommand:ERROR: Tag 1 SActive already set: SACT:ffffffff CI:ffffffff activeTags:0 reissue_flag:0
  • On the vpxd logs, the error below was seen.
    • error vpxd cannot contact the specified host (xxxxxx)
  • On ESXi you cannot view the vmfs/volumes partition
  • Host logs (vmkernel.log) show failed read/write operations and APD alerts:
    • ALERT: Bootbank cannot be found at path '/bootbank'WARNING: HPP: HppAttemptFailoverRequest:####: Re-issuing first command for HPP device "t10.ATA_____CISCO_VD________________________________####

Environment

  • vSphere ESXi 7.x
  • vSphere ESXi 8.x
  • Hosts booting from M.2 Raid Controller with Marvell 88SE9230 AHCI Chipset
    • Cisco UCS C240 M8 (Controller: UCS-M2-HWRAID2)
    • Dell PowerEdge (Controller: BOSS S1)
    • Lenovo ThinkSystem (Controller: M.2 RAID Kit)
    • Others

Cause

Hardware communication failure occurs when the Marvell 88SE9230 AHCI controller encounters a PCIe bus Master Abort or deadlock. The controller fails to follow AHCI specifications during port resets, leaving status registers in an inconsistent state (indicated by SACT:ffffffff) that prevents the vmw_ahci driver from recovering the device.

Resolution

  • Apply the latest Cisco, Dell, or Lenovo firmware bundle to optimize PCIe link training and AHCI tag handling.
  • Avoid running workloads or VMs on the M.2 drives attached to the controller.
  • Work with the appropriate hardware vendor to investigate the physical hardware (controller, drives, system board) and PCIe connections to the system board.