ESXi host fails with PSOD when starting a VM configured with an NVIDIA PCIe GPU as a PCI Pass-Through device
search cancel

ESXi host fails with PSOD when starting a VM configured with an NVIDIA PCIe GPU as a PCI Pass-Through device

book

Article ID: 424738

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • The ESXi host experiences a Purple Diagnostic Screen (PSOD) shortly after a Virtual Machine is powered on.
  • The Virtual Machine is configured with an NVIDIA PCIe GPU as a PCI Pass-Through device.
  • The logs in /var/run/log/LogEFI.log indicate a Page Fault (Exception 14) occurred within the nvidia kernel module:
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI: cpu61:40457370)ESC[45mESC[33;1mVMware ESXi 8.0.3 [Releasebuild-24022510 x86_64]ESC[0m
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI[2099381]: #PF Exception 14 in world 40457370:vmx IP 0x42001e14266b addr 0x9f
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI[2099381]: PTEs:0xaed5bdc027;0xae709ba027;0x0;
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI[2099381]:
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI[2099381]: Module(s) involved in panic: [nvidia 570.158.02 (External)]
    :::
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI: cpu61:40457370)Code start: 0x42001ce00000 VMK uptime: 165:03:53:09.926
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI: cpu61:40457370)0x453b0f29aee0:[0x42001e14266b]_nv040484rm@(nvidia)#<None>+0x14b stack: 0x1
    XXXX-XX-XXTXX:XX:XX.XXXZ In(14) LogEFI: cpu61:40457370)base fs=0x0 gs=0x42004f400000 Kgs=0x0
  • Just before the crash, a power-on operation was performed for a Virtual Machine configured with a PCI Pass-Through device.

Environment

VMware vSphere ESXi

Cause

The PSOD is triggered by the NVIDIA driver kernel module installed on the ESXi host.
The backtrace indicates that _nv040484rm@(nvidia) was the function executing when the failure occurred.
The error #PF Exception 14 indicates that the driver attempted to access an invalid memory address (addr 0x9f).

Resolution

Please consider the following action plan to resolve the issue:

  • Verify Compatibility
    Ensure that the installed NVIDIA driver version is compatible with your specific ESXi version and the server hardware.
    Refer to the "Broadcom Compatibility Guide" and the NVIDIA vGPU software release notes.

  • Update GPU Driver and Firmware
    Check if a newer version of the NVIDIA driver is available that addresses this specific PSOD.

  • Perform Hardware Diagnostics
    Run hardware diagnostics on the GPU device to rule out any underlying physical failures.
    Please contact your hardware vendor for assistance with diagnostics if necessary.

If the issue persists after performing the steps above, please collect a vm-support log bundle and contact Broadcom Support and your hardware vendor support.

Additional Information

Japanese KB: NVIDIA PCIe GPU を PCI パススルー デバイスとして構成した VM の起動時に ESXi ホストで PSOD が発生する