vMotion capabilites not allowed when is Unified Virtual Memory (UVM) enabled on a VMware Private AI Foundation with NVIDIA enabled cluster ( PAIF -N)
search cancel

vMotion capabilites not allowed when is Unified Virtual Memory (UVM) enabled on a VMware Private AI Foundation with NVIDIA enabled cluster ( PAIF -N)

book

Article ID: 426543

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server VCF Private AI Services

Issue/Introduction

Attempting to place a host in maintenance mode on clusters where hosts has GPUs configured.

Checking VPXD logs, you would see the below error. 

 [Originator@6876 sub=VmCheck item=DetectAndFixPolicyViolations opID=WorkQueue-185c9f38] CompatCheck results: (vim.vm.check.Result) [
-->    (vim.vm.check.Result) {
-->       vm = 'vim.VirtualMachine:',
-->       host = 'vim.HostSystem:8',
-->       warning = (vmodl.MethodFault) [
-->          (vim.fault.MigrationFault) {
-->             faultMessage = (vmodl.LocalizableMessage) [
-->                (vmodl.LocalizableMessage) {
-->                   key = "com.vmware.vim.vpxd.vmcheck.vgpuRelocateOrCloneWarning",
-->                }
-->             ],
-->             msg = ""
-->          },
-->          (vim.fault.MigrationFault) {
-->             faultMessage = (vmodl.LocalizableMessage) [
-->                (vmodl.LocalizableMessage) {
-->                   key = "com.vmware.vim.vpxd.vmcheck.vgpuRelocateOrCloneWarning",
-->                }
-->             ],
-->             msg = ""
-->          }
-->       ],
-->       error = (vmodl.MethodFault) [
-->          (vim.fault.MigrationFeatureNotSupported) {
-->             faultMessage = (vmodl.LocalizableMessage) [
-->                (vmodl.LocalizableMessage) {
-->                   key = "com.vmware.vim.vpxd.vmcheck.vgpuMigrateNotSupported",
-->                }

Environment

vSphere Kubernetes Service
VMware Private AI Foundation with NVIDIA

Cause

This is by design. NVIDIA disables vMotion support when UVM is enabled.

Resolution

There are two options, depending on workload requirements:

1. Create a Custom VM Class for NVIDIA vGPU Devices

Steps on this found in create a custom vm class for nvidia vgpu devices

2. Continue using passthrough (DirectPath I/O), but disable vMotion dependency

  • If workloads strictly require full bare-metal GPU access, vMotion cannot be used.
  • In this case:
    • Manually power off the VM(s) using GPU passthrough before placing the ESXi host into Maintenance Mode.
    • Proceed with the ESXi upgrade.
    • Power on the VM(s) after the upgrade completes.
  • Note: This approach does not provide vMotion, HA restart, or DRS balancing for these VMs. Availability must be managed at the application layer or through alternative recovery mechanisms.

Additional Information

SDDC KB for similar issue. 409562

NVIDIA documentation on UVM 

Reference: NVIDIA AI Enterprise User Guide