"VMware Update Manager remediations fail for 2-host clusters" after upgrading vCenter upgrade to 6.5 or later

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
After upgrading VMware vCenter to version 6.5 or later:

Patching ESXi hosts at the cluster level or higher with VMware Update Manager (VUM) fails if HA is enabled and the cluster contains only two hosts.
In the /var/log/vmware/vmware-updatemgr/vum-server/vmware-vum-server-log4cpp.log file, you see entries similar to:

[2018-01-19 23:43:33:006 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 972] No of hosts being passed to DRS API : 2
[2018-01-19 23:43:33:006 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1012] Using Demand capacity ratio target value 150
[2018-01-19 23:43:33:006 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1020] Calling DRS API for enter maintenance mode
[2018-01-19 23:43:33:013 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1035] DRS API returned faults
[2018-01-19 23:43:33:013 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1040] No current remediation is going on so address the faults
[2018-01-19 23:43:33:013 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1044] No of returned faults : 4
[2018-01-19 23:43:33:014 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1053] Name of VM that caused fault : Test1
[2018-01-19 23:43:33:016 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1071] DRS API indicates no active host available in cluster. Discarding fault.
[2018-01-19 23:43:33:017 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1053] Name of VM that caused fault : Test2
[2018-01-19 23:43:33:019 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1071] DRS API indicates no active host available in cluster. Discarding fault.
[2018-01-19 23:43:33:020 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 ERROR] [vciClusterJobSchedulerTask, 1053] Name of VM that caused fault : Test3
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1071] DRS API indicates no active host available in cluster. Discarding fault.
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1122] Fault caused by Host state
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1160] No of hosts recommended by DRS API : 0
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 440] DRS API did not return any hosts
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 473] Mmode failure choice is retry
[2018-01-19 23:43:33:022 'VciTaskBase.VciClusterJobDispatcherTask{17}' 139968768141056 INFO] [vciClusterJobSchedulerTask, 1364] No of attempts: 0 for host host-131

Note: For Windows deployments, the log is located in C:\Documents and Settings\All Users\Application Data\VMware\Update Manager\ by default.

Environment

VMware vCenter Server 6.5.x
VMware vSphere Update Manager 6.7.x
VMware vCenter Server 6.7.x
VMware vSphere Update Manager 6.5.x

Cause

vCenter Server 6.5 and later contains several improvements to the DRS API for distributing workload. One of these changes impacts VUM remediation of two host clusters if HA is enabled.

If you keep HA enabled, remediation attempts on host in the cluster fail, because HA cannot provide recommendation to Update Manager to place any of the hosts into maintenance mode.

If one of the two hosts is placed into maintenance mode there is no failover host left available in the cluster.

Resolution

To resolve this issue, ensure successful remediation on a 2-node cluster. Disable HA on the cluster or place the hosts in maintenance mode manually and then perform remediation on the two hosts in the cluster.

Additional Information

For more information, see:

Remediating Hosts section in the vSphere Product Documentation.
Remediating Hosts section in the vSphere Product Documentation.