When attempting to remediate a cluster of ESXi hosts with Lifecycle Manager in vCenter Server's GUI (vSphere Client), the remediation task gets stuck waiting for a specific host in the cluster to remediate. The task never progresses due to a host failure (i.e. physical host cannot reboot after the new ESXi software is installed).
This causes a delay for the other hosts in the cluster, preventing them from remediating.
vCenter Server 8.x
ESXi 8.x
While the host issue is being resolved, the host needs to be removed from the cluster, in order to allow the other hosts in the cluster to be remediated.
To workaround this stuck remediation task and host failure:
1) Migrate the VM's off the failing host to another host, if possible.
2) Remove the host from the cluster object in vSphere Client.
3i) Restart the vSphere UI service with service-control --stop vsphere-ui && service-control --start vsphere-ui . When the vSphere Client comes back, check if the remediation task has cleared.
3ii) If the remediation task is still running and stuck, restart the VPXD service with service-control --stop vmware-vpxd && service-control --start vmware-vpxd . When the vSphere Client comes back, check the task again.
3iii) If the remediation task is still running and stuck, reboot the vCenter Server. After boot, confirm the task is cleared.
With remedaition task cleared and failing host removed from the esxi cluster, the remediation can be tried again.
Reference KB: