When attempting to enable HA, the following errors and symptoms may be observed:
Task List Errors:
"Cannot complete the configuration of the vSphere HA agent on the host. 'Setting desired image spec for cluster failed'"
"Cannot complete the configuration of the vSphere HA agent on the host. Applying HA VIBs on the cluster encountered a failure"
General Symptoms:
HA remediation tasks get stuck at 10%.
The Update Manager service frequently stops.
"No healthy upstream" displayed on the Host updates page and Lifecycle Manager page.
Misconfigured custom image from Vendor.vSphere FDM (Fault Domain Manager), is a "solution," being incorrectly listed as a "component" in the cluster image. This faulty image prevented successful remediation attempts and causes the Update Manager service to crash.YYYY-MM-DDTHH:MM:SS.148-06:00 info vmware-vum-server[3034753] [Originator@6876 sub=EntityImageManager] [EntityImageManager 540] [Get] image {--> "add_on": {--> "name": "Cisco-UCS-Addon-ESXi",--> "version": "4.3.5-a"--> },--> "alternative_images": null,--> "base_image": {--> "version": "8.0.3-0.70.24674464"--> },--> "components": {--> "vsphere-fdm": "8.0.3-24322831"--> },--> "hardware_support": null,--> "removed_components": null,--> "solutions": {--> "com.vmware.vsphere-ha": {--> "components": [--> {--> "component": "vsphere-fdm"--> }--> ],--> "version": "8.0.3-24674346"--> }--> }
HA is a solution so it shouldn't be managed by users (no need to add to a cluster image)
Modify the cluster image to remove the incorrectly listed vSphere FDM "component"
Click the 3 horizontal dots next to the Edit button to export the image as a json.. edit the file to remove the component and reimport it.
info vmware-vum-server[3034753] [Originator@6876 sub=EntityImageManager] [EntityImageManager 1019] [SoftwareSpecToStr] image spec(JSON String): {
--> "add_on": {
--> "name": "Cisco-UCS-Addon-ESXi",
--> "version": "4.3.5-a"
--> },
--> "alternative_images": null,
--> "base_image": {
--> "version": "8.0.3-0.70.24674464"
--> },
--> "components": {
--> "vsphere-fdm": "8.0.3-24322831" <<<-----
--> },
--> "hardware_support": null,
--> "removed_components": null,
--> "solutions": {
--> "com.vmware.vsphere-ha": {
--> "components": [
--> {
--> "component": "vsphere-fdm"
--> }
--> ],
--> "version": "8.0.3-24674346"
--> }
--> }
{
"add_on": {
"name": "Cisco-UCS-Addon-ESXi",
"version": "4.3.5-a"
},
"alternative_images": null,
"base_image": {
"version": "8.0.3-0.73.24784735"
},
"components": null,
"hardware_support": null,
"removed_components": null,
"solutions": null
}
Remediation will success and HA will start as expected.