"Cannot complete the configuration of the vSphere HA agent on the host. "Applying HA VIB's on the cluster encountered a failure"
[YYYY-MM-DDTHH:MM:SS] error vpxd[48856] [Originator@6876 sub=DAS opID=########-#######-####-####-##:########-##-##] Timed out while waiting to monitor the progress of task: ########-####-####-####-############:com.vmware.esx.settings.clusters.software.ha_internal
[YYYY-MM-DDTHH:MM:SS] error vpxd[48856] [Originator@6876 sub=DAS opID=########-#######-####-####-##:########-##-##] Apply HA task failed with error N5Vmomi5Fault11SystemError9ExceptionE(Fault cause: vmodl.fault.SystemError
--> )
--> [context]zKq7AVECAQAAAG0mVQETdnB4ZAAA9tg3bGlidm1hY29yZS5zbwAAjXgsAAtsLQAT6TIBwaFvdnB4ZAABhaNvARxlxgFrd8YBTYHGgenLYAGBKs1gAYFY3GABgbsJYAGBhrNgAQCnSSMANZ8jALRkNwKHfwBsaWJwdGhyZWFkLnNvLjAAAy82D2xpYmMuc28uNgA=[/context]
[YYYY-MM-DDTHH:MM:SS] error vpxd[48856] [Originator@6876 sub=DAS opID=########-#######-####-####-##:########-##-##] ApplyHA result is null
[YYYY-MM-DDTHH:MM:SS] info vpxd[48856] [Originator@6876 sub=Default opID=########-#######-####-####-##:########-##-##] [VpxLRO] -- ERROR task-14896662 -- Cluster-01 -- DasConfig.ConfigureCluster: vim.fault.DasConfigFault:
--> Result:
--> (vim.fault.DasConfigFault) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> reason = "ApplyHAVibsOnClusterFailed",
--> output = <unset>,
--> event = <unset>
--> msg = ""
--> }
--> Args:
-->
[YYYY-MM-DDTHH:MM:SS] info vmware-vum-server[42393] [Originator@6876 sub=ClusterApplyHATask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.ClusterApplyHATask ID:########-####-####-####-############. Task Created
[YYYY-MM-DDTHH:MM:SS] info vmware-vum-server[01166] [Originator@6876 sub=ClusterApplyHATask] [Task, 457] Task:com.vmware.vcIntegrity.lifecycle.ClusterApplyHATask ID:########-####-####-####-############. Task State updated to SUCCEEDED
Example:
[YYYY-MM-DDTHH:MM:SS] lifecycle: 35769095: runcommand:186 INFO runcommand called with: args = '['/sbin/smbiosDump']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
Gap of 10mins
[YYYY-MM-DDTHH:MM:SS] lifecycle: 35769095: upgrade_precheck:2160 INFO Image size: 270 MB, Maximum size: 4084 MB
Gap of 9 mins
[YYYY-MM-DDTHH:MM:SS] lifecycle: 35769095: upgrade_precheck:2222 INFO Locker currently have 190036056 bytes in package folder, and 109571997696 bytes free. Incoming image has 168691854
bytes of locker payloads, estimate to take 208946765 bytes of space.
[YYYY-MM-DDTHH:MM:SS] ConfigStore[35782741]: SlowRefresh: path /vmfs/volumes/<Datastore_UUID> total blocks 16492405981184 used blocks 13870742634496forceRefresh
[YYYY-MM-DDTHH:MM:SS] ConfigStore[35782741]: SlowRefresh: path /vmfs/volumes/<Datastore_UUID> total blocks 128580583424 used blocks 19013828608forceRefresh = 0
Note: The above log messages can be confirmed in the cluster managed by vLCM images.
Note: The counter for this timeout value starts after the task is started in the vum-server and not in vpxd.
das.remediateHATaskTimeoutSecs to 1800das.remediateHATaskTimeoutSecs, setting a value of at least 1800 secondsNote: The timeout value is specified in seconds. This value can be further increased if required, based on the environment.
For information on how to set HA Advanced parameters, refer to vSphere HA Advanced Options
To check whether filesystem operations are slow, execute:
time esxcli storage filesystem list
Note: In environments with a large number of datastores, HA configuration takes longer because of the delay in querying the filesystems.