ESXi 8.0 hosts may experience a recurring Purple Screen of Death (PSOD) when utilizing NVIDIA vGPU profiles. This occurs when the IOMMU detects an invalid memory translation entry.
YYYY-MM-DDT HH:MM:SS cpu#:#######)@BlueScreen: IOMMU Fault detected for (vmgfx#/nvidia-gpu) IOaddr: ############ Reason: 0x79 (Invalid Read/Write permission(R=W=0) for second-level paging entry) Domain: ############
VMware vSphere ESXi 8.0
GPU Driver: NVD-VMware_ESXi_8.0.0_Driver version 580.126.08-1OEM.800.1.0.20613240
Management Daemon: nvdgpumgmtdaemon version 580.65.05-1OEM.700.1.0.15843807
The NVIDIA 580.x driver series contains a logic error where memory pages assigned to the GPU are occasionally marked with "No Read" and "No Write" permissions (R=W=0) in the second-level page tables.
When the IOMMU hardware attempts to process a DMA (Direct Memory Access) request against these pages, it triggers a fault, leading to the ESXi host crash
esxcli software vib list | grep -i nvidia580.126.08 and 580.65.05 are installed.IOMMU Fault 0x79" for ESXi 8.0