Error: "PCI passthru device caused IOMMU fault" when VM Powers Off Unexpectedly
search cancel

Error: "PCI passthru device caused IOMMU fault" when VM Powers Off Unexpectedly

book

Article ID: 392714

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Virtual machines with PCI passthrough devices may unexpectedly power off with the following error message in the ESXi logs:

VMware ESX unrecoverable error: (vmx) PCI passthru device 0000:##:00.# caused an IOMMU fault type 4 at address 0x###########

This error primarily occurs with Intel accelerator cards (such as AC100/E810) using PCI passthrough and results in virtual machine disruption or termination.

Environment

  • VMware vSphere ESXi hosts
  • Virtual machines using PCI passthrough
  • Intel accelerator cards (specifically AC100 and E810 models)
  • Device using Direct Memory Access (DMA) capabilities

Cause

The Input-Output Memory Management Unit (IOMMU) is a hardware component that connects DMA-capable I/O buses to system memory. It maps device-visible virtual addresses to physical addresses.

The error occurs at the hardware level when the PCI passthrough device attempts an invalid memory operation that the IOMMU detects and blocks. This is primarily a hardware issue with the accelerator card, its driver, or the application using it, rather than an ESXi software issue.

Potential specific causes include:

  1. Hardware faults in the accelerator card
  2. Driver incompatibility or bugs
  3. Application-level issues with the software using the device
  4. Firmware mismatches between components

Resolution

Step 1: Contact the device vendor

Since this is a hardware-level issue, contact the device vendor (in this case, Intel) for:

  • Firmware updates
  • Driver updates
  • Known issues with your specific hardware model
  • Device-specific debugging tools

Step 2: Collect diagnostic information

  1. Gather ESXi host logs showing the error: a. Log in to the vSphere Client b. Select the affected ESXi host c. Navigate to Monitor > Logs d. Generate a support bundle by clicking "Export System Logs" or run vm-support on the ESXi host directly

  2. Document the exact error message including: a. The device identifier (e.g., 0000:8b:00.1) b. The IOMMU fault type c. The address where the fault occurred

  3. Identify the affected hardware using the ESXi shell:

    lspci | grep -i accelerator
    

    Note the device model, vendor information, and device IDs

Step 4: Work with hardware and application vendors

  1. Schedule a joint troubleshooting session with both the device vendor and application vendor
  2. Implement any recommended diagnostic instrumentation in a lab environment first
  3. Consider a rollback to previous versions of hardware drivers or application software if the issue began after an upgrade

Additional Information

If IOMMU faults occur regularly, check for patterns such as:

  • Specific workloads or traffic patterns that trigger the issue
  • Timing patterns (do issues occur at peak usage times?)
  • Correlation with other system events
  • Host resource utilization at the time of failure

Related documentation: