vSphere is slow to reenable or fails completely after vCenter Server was patched
search cancel

vSphere is slow to reenable or fails completely after vCenter Server was patched

book

Article ID: 413708

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • After vCenter Server has been patched to a new version, it takes a very long time to reenable vSphere HA for the host clusters in the environment
  • In some cases, HA fails to enable completely with an error message:

Cannot complete the configuration of vSphere HA agent on the host "Setting desired image spec for cluster failed. Set solution: General System error occurred: Image is not valid.

  • The VMware Updatemanager Server log, /var/log/vmware/vmware-updatemgr/logs/vmware-vum-server-<number>.log, contains entries similar to the following example: 
    <timestamp> info vmware-vum-server[165417] [Originator@6876 sub=com.vmware.vcIntegrity.lifecycle.SetSolutionTask] [Task 627] Set com.vmware.vcIntegrity.lifecycle.SetSolutionTask (########-####-####-####-############) progress to 60
    <timestamp> info vmware-vum-server[165417] [Originator@6876 sub=DraftsManager] [DraftsManager 1530] New progress 60 for Task: com.vmware.vcIntegrity.lifecycle.SetSolutionTask ID: ########-####-####-####-############
    <timestamp> info vmware-vum-server[165417] [Originator@6876 sub=DraftsManager] [DraftsManager 1586] Draft validation results: {
    -->       "errors": [
    -->             {
    -->                   "id": "com.vmware.vcIntegrity.lifecycle.EsxImage.SolutionNotFound",
    -->                   "message": {
    -->                         "args": [
    -->                               "com.vmware.vsphere-ha",
    -->                               "<version>"
    -->                         ],
    -->                         "default_message": "Software Solution com.vmware.vsphere-ha with version <version> cannot be found in depot.",
    -->                         "id": "com.vmware.vcIntegrity.lifecycle.EsxImage.SolutionNotFound",
    -->                         "localized": null,
    -->                         "params": null
    -->                   },
    -->                   "originator": null,
    -->                   "resolution": null,
    -->                   "retriable": null,
    -->                   "time": "<timestamp>",
    -->                   "type": null
    -->             }
    -->       ],
    -->       "info": [],
    -->       "warnings": []
    --> }
  • The Image Service log, /var/log/vmware/vmware-updtmgr/logs/imageservice.log, might contain entries like:
    <timestamp> INFO imageService[140219959989824] [SoftwareSpecMgr 1368] Image validation result: {'info': [], 'warnings': [], 'errors': [{'id': 'com.vmware.vcIntegrity.lifecycle.EsxImage.SolutionNotFound', 'message': {'id': 'com.vmware.vcIntegrity.lifecycle.EsxImage.SolutionNotFound', 'default_message': 'Software Solution com.vmware.vsphere-ha with version <version> cannot be found in depot.', 'args': ['com.vmware.vsphere-ha', '<version>']}, 'resolution': None, 'time': '<timestamp>'}]}

Environment

  • VMware vCenter Server 7.0.x
  • VMware vCenter Server 8.0.x

Cause

Every new vCenter Server version is being shipped with it's own version of the HA agent (FDM). Whenever vCenter is being patched, the Updatemanager creates a new offline depot to make this FDM version available to be be installed on the ESXi hosts in the clusters managed by vCenter.

Older versions of these offline depots are not longer required, as all managed hosts, independently of their actual version, will always use the FDM version specific to the current vCenter version. Nonetheless these depots are being kept, which over time can lead to a large number of offline depots, which will slow down HA activation, or might even cause the deployment to fail.

Resolution

Note: The following steps are inversive. Before applying them, please make sure to take a fresh backup or snapshot of the vCenter Server Appliance (VCSA). If Enhanced Linked Mode is configured, please be aware that offline snapshots (in powered off state) need to be created for all members of the ELM replication setup.

To fix this issue:

  1. Open an SSH connection to the VCSA 
  2. Login with the root account
  3. Enter the following command the change into the BASH shell:
    # shell
  4. List the VUM offline depots:
    # dcli com vmware esx settings depots offline list
  5. Identify any depots that are owned by vsphere-ha, but where the version is different from the version vCenter Server is currently running with, and note down its ID
  6. Run the following command for each of the IDs to clean up the old depots:
    # dcli com vmware esx settings depots offline delete --depot <ID>
  7. Restart the Updatemanager service:
    # vmon-cli -r updatemgr

Additional Information

An example for a script to automate the above steps is attached to this KB. The script does remove both the older versions of the HA offline depots and additional also cleans out older version of the ones containing older versions of WCP agents. Please do ensure to thoroughly test the example script on a non-productive vCenter Server Appliance, before attempting to use it with a productive VCSA.

Attachments

depot_cleanup.sh get_app