"Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'profile_name' Failed to start the virtual machine. Module DevicePowerOn power on failed." error when powering on a VM with vGPU device
search cancel

"Could not initialize plugin 'libnvidia-vgx.so' for vGPU 'profile_name' Failed to start the virtual machine. Module DevicePowerOn power on failed." error when powering on a VM with vGPU device

book

Article ID: 393089

calendar_today

Updated On: 04-04-2025

Products

VMware vSphere ESXi

Issue/Introduction

  • Virtual Machine with NVIDIA GPU device attached fails to power on.
  • When Navigating to Host -> Configure -> Hardware -> Graphics on the vCenter Server, the NVIDIA graphics memory shows 0B.
  • After the ESXi host is rebooted, the NVIDIA graphics memory shows the right value and the VM powers on successfully.
  • The issue reoccurs after a while when the workload is too heavy for the system.

 

Environment

VMware vSphere ESXi 8.x

Cause

  • In /var/run/log/vmkernel.log, below error message is seen:
    YYYY-MM-DDTHH:MM:SSZ In(182) vmkernel: cpu109:2097261)NVRM: GPU at PCI:0000:##:00: GPU-84286ac8-7383-648f-fc3d-############
    YYYY-MM-DDTHH:MM:SSZ In(182) vmkernel: cpu66:2246570)NVRM: Xid (PCI:0000:##:00): 120, pid='<unknown>', name=<unknown>, GSP task panic: unknown error (0) @ pc:0x10009##, aux:0x0, partition:2#4, task:1
    YYYY-MM-DDTHH:MM:SSZ In(182) vmkernel: cpu109:2097261)NVRM: Xid (PCI:0000:##:00): 31, pid='<unknown>', name=<unknown>, Ch 000000##
  • In vmware.log of the affected VM, below error messages are seen:
    2024-12-03T01:43:10.190Z Er(02) vmx - vmiop_log: (0x0): Failed to get GPU info
    2024-12-03T01:43:10.190Z Er(02) vmx - vmiop_log: (0x0): Initialization: Failed to get GPU info error 2
    2024-12-03T01:43:10.190Z Er(02) vmx - vmiop_log: (0x0): init_device_instance failed for inst 0 with error 2 (unable to setup host connection state)
  • The issue has been reported for NVIDIA L40 Device.

Resolution

  • This is a result of NVIDIA device error reported in vmkernel log. This is a NVIDIA firmware/hardware issue on the NVIDIA device.
  • Ensure the NVIDIA driver is updated to the latest version to avoid running into a known issue.
  • Contact NVIDIA to validate what is causing these Xid errors.

 

Additional Information