Error: "PCI passthru device caused IOMMU fault" when VM Powers Off Unexpectedly

search cancel

Error: "PCI passthru device caused IOMMU fault" when VM Powers Off Unexpectedly

book

Article ID: 392714

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Virtual machines with PCI passthrough devices may unexpectedly power off with the following error message in the ESXi logs:

VMware ESX unrecoverable error: (vmx) PCI passthru device 0000:##:00.# caused an IOMMU fault type 4 at address 0x###########

This error primarily occurs with Intel accelerator cards (such as AC100/E810) using PCI passthrough and results in virtual machine disruption or termination.

Environment

VMware vSphere ESXi hosts
Virtual machines using PCI passthrough
Intel accelerator cards (specifically AC100 and E810 models)
Device using Direct Memory Access (DMA) capabilities

Cause

The Input-Output Memory Management Unit (IOMMU) is a hardware component that connects DMA-capable I/O buses to system memory. It maps device-visible virtual addresses to physical addresses.

The error occurs at the hardware level when the PCI passthrough device attempts an invalid memory operation that the IOMMU detects and blocks. This is primarily a hardware issue with the accelerator card, its driver, or the application using it, rather than an ESXi software issue.

Potential specific causes include:

Hardware faults in the accelerator card
Driver incompatibility or bugs
Application-level issues with the software using the device
Firmware mismatches between components

Resolution

Step 1: Contact the device vendor

Since this is a hardware-level issue, contact the device vendor (in this case, Intel) for:

Firmware updates
Driver updates
Known issues with your specific hardware model
Device-specific debugging tools

Step 2: Collect diagnostic information

Gather ESXi host logs showing the error: a. Log in to the vSphere Client b. Select the affected ESXi host c. Navigate to Monitor > Logs d. Generate a support bundle by clicking "Export System Logs" or run vm-support on the ESXi host directly
Document the exact error message including: a. The device identifier (e.g., 0000:8b:00.1) b. The IOMMU fault type c. The address where the fault occurred
Identify the affected hardware using the ESXi shell:
```
lspci | grep -i accelerator
```
Note the device model, vendor information, and device IDs

Step 4: Work with hardware and application vendors

Schedule a joint troubleshooting session with both the device vendor and application vendor
Implement any recommended diagnostic instrumentation in a lab environment first
Consider a rollback to previous versions of hardware drivers or application software if the issue began after an upgrade

Additional Information

If IOMMU faults occur regularly, check for patterns such as:

Specific workloads or traffic patterns that trigger the issue
Timing patterns (do issues occur at peak usage times?)
Correlation with other system events
Host resource utilization at the time of failure

Feedback

thumb_up Yes

thumb_down No