What triggers agentgroup resolution within the AE
search cancel

What triggers agentgroup resolution within the AE

book

Article ID: 232359

calendar_today

Updated On:

Products

CA Automic One Automation CA Automic Workload Automation - Automation Engine

Issue/Introduction

Periodically, there is an issue where jobs within an agentgroup have a long delay in actually starting (versus activation). In looking at a subset, what appears to be happening is the agentgroup isn't resolving correctly and then after some amount of time it gets 'unstuck' without manual intervention and all the jobs trigger.

  1. The hostgroup is one that has a limit on jobs that can run per host.
  2. All of these hosts are active when this occurs

    The following shows up in the job report with U00005014 showing multiple times for the same job.

    2021-12-15 08:40:06 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:41:25 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:42:10 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:42:25 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:44:45 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:46:05 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:49:45 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:55:05 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 09:00:09 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 09:00:29 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
  3. At the same time, other jobs will show the same symptoms:

    2021-12-15 08:48:37 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:49:45 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 08:55:06 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 09:00:10 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.
    2021-12-15 09:00:30 - U00005014 AgentGroup 'HOSTG.NAME', client '1000', version '455' has been resolved.

Once one of the impacted jobs resolves it's agentgroup correctly, all the impacted jobs start

What is the mechanism the AE uses for agent group resolution and when is it done? Based on the timings, it's not per-minute but when it happens, it happens to all tasks pending within the agentgroup. 



Environment

Release : 12.3.3

Component :

Resolution

There's a variable in UC_SYSTEM_SETTINGS called AGENTGROUP_CHECK_INTERVAL that controls "the interval in minutes at which tasks that are waiting for the host of an agent group are checked."  This means that once an agent group is "fixed', it could take up to 10 minutes by default for the agentgroup to be resolved.

If the time between an agentgroup being ready and a job resolving the agentgroup falls outside of what is set in the AGENTGROUP_CHECK_INTERVAL, please reproduce this behavior with a tcp/ip=2 and database=3 trace on the WPs before the jobs are activated until they start.  Once that is done and traces gathered, open a case with Broadcom support and send:

  • WP and CP logs
  • WP traces
  • Runid(s) for the job(s) that had this issue
  • xml export of the impacted job(s)
  • xml export of the agentgroup.