ESXi Purple Screen Due to QLogic/Marvell HBA Driver Deadlock
search cancel

ESXi Purple Screen Due to QLogic/Marvell HBA Driver Deadlock

book

Article ID: 380959

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi hosts may experience a Purple Screen of Death (PSOD) with symptoms indicating a deadlock condition in the QLogic/Marvell Host Bus Adapter (HBA) driver. The system logs show repeated "Ran out of IOCBs" warnings immediately before the crash.

  • ESXi host encounters a sudden Purple Screen of Death

  • System logs show repeated warnings:
    • qlnativefc: vmhba#(#:#.#): SCM: Ran out of IOCBs, partial data 0x##

  • Final panic message indicates:
    • Spin count exceeded - possible deadlock with PCPU #

  • The backtrace contains multiple references to qlnativefc functions:
    • qla27xxCopyFpinPkt
    • qla24xxProcessResponseQueue
    • qla24xxMsixRspQ

Environment

  • VMware ESXi 7.0 or newer
  • QLogic/Marvell Fibre Channel HBAs
  • qlnativefc driver


Cause

The issue occurs due to a deadlock condition in the qlnativefc driver when processing I/O Control Blocks (IOCBs). This can happen when:

  • The HBA firmware and driver versions are mismatched
  • The installed versions are outdated
  • There are underlying storage performance issues causing increased I/O latency

Resolution

  1. Identify current firmware and driver versions:
    •  esxcli software vib list | grep -i qln
    •  esxcli storage core adapter list

  2. Update the qlnativefc driver:
    1. Download the latest compatible driver from your hardware vendor
    2. For installing asynchronous patches and drivers, see Download and install async drivers in VMware ESXi

  3. Update HBA firmware:
    1. Obtain the latest firmware from your hardware vendor
    2. Follow vendor-specific firmware update procedures
    3. Ensure the firmware version is compatible with both your hardware and ESXi version

  4. Verify storage performance:
    1. Check storage latency using esxtop
    2. Review any LUN connectivity issues
    3. Address any identified storage performance problems

  5. After updates:
    1. Reboot the host
    2. Verify driver and firmware versions
    3. Monitor for recurrence of the issue

Additional Information

  • Always check the VMware Compatibility Guide or your hardware vendor's compatibility matrix for recommended driver/firmware combinations

  • Consider updating BIOS and other system firmware as part of the resolution

  • Maintain consistent firmware and driver versions across all hosts in a cluster

  • Regular monitoring of storage performance metrics can help identify potential issues before they cause system failures