vSphere Lifecycle Manager is unable to place a host into maintenance mode if it has a running vGPU Virtual Machine

search cancel

vSphere Lifecycle Manager is unable to place a host into maintenance mode if it has a running vGPU Virtual Machine

book

Article ID: 313391

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

vSphere Lifecycle Manager is unable to place a host into maintenance mode if it has a running vGPU Virtual Machine.

“PRE-CHECK” gives the below message:
Virtual machine ... that runs on host '...' reported an issue which prevents entering maintenance mode...

“REMEDIATE ALL” gives the below error:
Entering maintenance mode failed for host ...

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.0.x

Cause

Assuming there is no compatibility issue for the migration of running vGPU Virtual Machines, DRS automation for running vGPU Virtual Machines is disabled by default due to the extended vMotion downtimes (Stun Time) which may exceed 100 seconds.

Please refer to Virtual Machine Conditions and Limitations for vSphere vMotion for more information.

Resolution

For vSphere Lifecycle Manager with vCenter 8.0 and newer, a VI Admin can enable DRS automation using the DRS Cluster Advanced Option described in vGPU Virtual Machine automated migration for Host Maintenance Mode in a DRS Cluster.

Remarks:

There must be spare vGPU host capacity in the Cluster for running vGPU Virtual Machines to migrate to during upgrade.
It is recommended to place vGPU Virtual Machines on shared storage to expedite migration.
Refer to the vendor's compatibility matrix for host driver upgrades.

Workaround:

Upgrades of hosts with running vGPU Virtual Machines using vSphere Update Manager OR vSphere Lifecycle Manager with vSphere 7.x and older, is NOT supported. A VI Admin can use the following Migration options to allow for Host Maintenance Mode:

vMotion the vGPU Virtual Machines
Suspend the vGPU Virtual Machines
Power Off the vGPU Virtual Machines

For Virtual Machines that exceed the vMotion timeout, there are the following options to allow for Host Maintenance Mode:

Suspend the vGPU Virtual Machines
Power Off the vGPU Virtual Machines
Adjust the vMotion timeout to account for the Virtual Machine's Expected Worst-Case Stun Time.
vMotion Timeout: vMotion or Storage vMotion of a VM fails with the error: The migration has exceeded the maximum switchover time of 100 second(s).
Estimated Worst-Case Stun Times: Virtual Machine Conditions and Limitations for vSphere vMotion

Feedback

thumb_up Yes

thumb_down No