vMotion with vGPU failed on "Insufficient resources" during compatibility check
search cancel

vMotion with vGPU failed on "Insufficient resources" during compatibility check

book

Article ID: 321990

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

  • Virtual Machine has been configured with vGPU.
  • vMotion with vGPU has been enabled in vCenter Server.
  • When select ESXi host when initiate vMotion, the compatibility checking failed on the below messages:
Insufficient resources
One or more devices (pciPassthru0) required by VM vmtest are not available on host host1
  • The "nvidia-smi" command showed destination ESXi host has enough free GPU resource.
  • The GPU in both source and destination ESXi hosts are same model.



Environment

VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x

Cause

vMotion of a vGPU VM will fail if the NVIDIA GPU ECC Mode is different on source and destination ESXi hosts. 

Resolution

To confirm the GPU ECC mode enabled or disabled in ESXi hosts,
1. Run the command "nvidia-smi" on both hosts. The below picture shows the sample output

The RED cycle value is the ECC mode:
* 0 = ENABLED
* Off = DISABLED

Both ESXi hosts must be same. If not, it need change one of them to be same with another. 

2. Change the ECC mode with below command in ESXi host:
# nvidia-smi --ecc-config=ENABLED|DISABLED

For example, to set the ECC mode to be disabled:
# nvidia-smi --ecc-config=DISABLED

3. Reboot the ESXi host to make this change effective.

Additional Information

This issue was noticed when using driver version: 
NVD-VMware_ESXi_7.0.2_Driver   -  525.125.03-1OEM.702.0.0.17630552

We did not see the problem after upgrading driver to version: 
NVD-VMware_ESXi_8.0.0_Driver  -  525.125.03-1OEM.800.1.0.20613240