Powering on a Windows 10 VM with NVIDIA RTX 6000 Blackwell GPU results in PSOD on ESXi 9.0.1.0(24957456)
search cancel

Powering on a Windows 10 VM with NVIDIA RTX 6000 Blackwell GPU results in PSOD on ESXi 9.0.1.0(24957456)

book

Article ID: 421417

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXI results in PSOD when powering on a VM with NVIDIA RTX GPU in passthrough mode

    0x45396cddbc98:[0x4200211a07ee]SPLockWork@vmkernel#nover+0x1a stack: 0x0, 0x11300000000, 0x430dd2804790, 0x0, 0x0
    0x45396cddbca0:[0x42002119c1bd]SemaphoreLockInt@vmkernel#nover+0x2e stack: 0x11300000000, 0x430dd2804790, 0x0, 0x0, 0xc
    0x45396cddbcf0:[0x420021103ae6]PCIEHP_RestoreHPInterruptPostReset@vmkernel#nover+0x7b stack: 0x1, 0x430dd2804790, 0x1, 0x0, 0x1
    0x45396cddbd30:[0x4200211052c2]PCI_ResetSubTopology@vmkernel#nover+0x207 stack: 0x430dd28051a0, 0x4320ca601430, 0x45396cddf000, 0x0, 

 

Environment

ESX 9.0.1

 

Cause

  • The graphic device reports the slot is PCIE hot-plug capable, but it doesn't support surprise hot-plug nor hot-plug.
  • The process didn't initialize PCIEHPController data structure. However, it was accessed when resetting the device when powering on the VM where the device is attached as a passthru device.

Resolution

Follow the below steps 

NOTE: Setting enablePCIEHotplug=FALSE prevents ESXi from enabling hot-plug during server boot, even if the hardware supports it.

  1. Disable PCIe Hot-Plug by running the following command on the ESXi host:

    esxcli system settings kernel set -s "enablePCIEHotplug" -v "FALSE"

  2. Modify the passthrough configuration

    • Backup the file
      cp /etc/vmware/passthru.map /etc/vmware/passthru.map.bak

    • Edit the file using vi/etc/vmware/passthru.map.
    • Locate the line:

      # NVIDIA (FLR issue on Ampere and Hopper GPUs)
      10de ffff bridge false

    • Change it to:
       
      # NVIDIA (FLR issue on Ampere and Hopper GPUs)
      10de ffff default false

  3. Reboot the ESXi host to apply the changes.
  4. Verify that PCIe device hot-plug is disabled by entering the command:

    esxcli system settings kernel list -o enablePCIEHotplug

    The entry, "FALSE," should be displayed under the Configured column:

    Name                Type     Configured  Runtime Default  Description 
    ------------        ----     ---------   ------  -------  -----------
    enablePCIEHotplug   Bool      FALSE      FALSE    TRUE   Enable PCI-E Native Hotplug support