This article will provide a workaround for users who might encounter the vGPU VM power-on failure issue.
In vSphere 8.0u1 and later, the gpuManager service should start automatically with the host when using NVIDIA vGPU features such as Multi-Instance GPU or NVSwitch.
In some ESXi upgrade scenarios (upgrade from ESXi 8.0 GA build 20513097 or ESXi 8.0a Patch build 20842819 to ESXi 8.0u1 or a higher version), however, the service may not start automatically. This can cause a vGPU VM to fail to power on. If you plan to use these NVIDIA vGPU features, please double check and ensure the startup policy of the gpuManager service is set to "Start and stop with host."
Currently there is no resolution.
On the upgraded ESXi host, which supports NVIDIA Multi-Instance GPU or NVSwitch, if the vGPU VM fails to power on, please double check the startup policy of the gpuManager service and ensure it is set to "Start and stop with host."
Below are the steps for vCenter UI or ESXi command line:
vCenter UI:
From the vCenter UI, select the ESXi host.
Choose 'Configure' tab, then click 'Services'
Find 'gpuManager' in the service list, and check its Startup Policy.
Click on EDIT STARTUP POLICY to set startup policy to "Start and stop with host"
ESXi command line:
Run the 'chkconfig --list gpuManager' command to check the service startup policy.
Startup policy is "Start and stop with host" when command output is "on", or
"Start and stop manually" when command output is "off"
After changing the startup policy by following the above steps, reboot the ESXi host. The gpuManager service should be started up and running automatically after the reboot, and the vGPU VM should be able to power on.
Note:
If the ESXi host supports NVIDIA Multi-Instance GPU or NVSwitch, the gpuManager service should run automatically. If the ESXi host does not support these features, the gpuManager service is not needed and will not run.