When placing an Esxi host in maintenance mode DRS may fail to automatically migrate VM's that have vGPU enabled.
Under a Cluster's Monitor tab the "vSphere DRS" -> "Recommendations" section may have a VM Migration recommendation for vGPU VMs with a Reason "Destination host is selected for fixing policy/rule violation or its healthy state is better".
Manually vMotioning vGPU enabled VM's is working as expected.
This is expected behaviour at this time. Per the documentation, DRS will only complete initial placement of a VM with vGPU, but will not automatically load balance:
Using vMotion to Migrate vGPU Virtual Machines (vmware.com)
"DRS supports initial placement of vGPU VMs running vSphere 6.7 Update 1 and later without load balancing support"
Starting with vSphere 8.0 U2, DRS can estimate the Stun Time for a given vGPU VM configuration. When the DRS Cluster Advanced Options are set and the Estimated VM Devices Stun Time for a VM is lower than the VM Devices vMotion Stun Time limit, DRS will automate VM migrations.
To enable this functionality, make sure the infrastructure meets the following requirements:
Then add the following DRS Cluster Advanced Options:
Option: PassthroughDrsAutomation
Value: 1
Option: LBMaxVmotionPerHost
Value: 1
For vGPU VMs with Stun Times exceeding the "vMotion Stun Time Limit" (default 100 seconds), a VI Admin can add the following DRS Cluster Advanced Option:
Option: VmDevicesStunTimeTolerated
Value: <number of seconds, greater than any VM's Estimated Stun Time in the Cluster> (Default 100 seconds)
OR
Modify the "vMotion Stun Time Limit" in the VM's Configuration -> "VM Options" Tab -> "Advanced" Section
For older releases, to resolve the issue please follow the below mentioned points:
Workaround:
To workaround this issue manually migrate vGPU enabled VMs to another host.