AWA: Poor Performance - heavy PWP utilization, multiple lines of READY_FOR_RUN
search cancel

AWA: Poor Performance - heavy PWP utilization, multiple lines of READY_FOR_RUN

book

Article ID: 192610

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

After a period of Agent disconnects, the PWP begins to see large increases and utilization, and multiple lines similar to:

20200608/090951.299 - U00011183 Client '0101', RunID '0762422582'. Message 'EXAKTJ' for Agent 'AGENT_NAME1' was acknowledged with error code 63 (not connected). Job status will be changed to READY_FOR_RUN.
20200608/090951.330 - U00011183 Client '0101', RunID '0761590350'. Message 'EXAKTJ' for Agent 'AGENT_NAME2' was acknowledged with error code 63 (not connected). Job status will be changed to READY_FOR_RUN.

Usually the symptoms seen are something like:

1.) Agent generates core dump / ends. WP's do not appear to reflect the issue.
2.) WP shows JOB_INF / losconn processing.
3.) PWP then goes into the following loop:

20200706/234942.912 - U00000063 Partner 'AGENT_NAME1' is not connected to the server.
20200706/234942.932 - U00011183 Client '4000', RunID '0081848970'. Message 'EXAKTJ' for Agent 'AGENT_NAME1' was acknowledged with error code 63 (not connected). Job status will be changed to READY_FOR_RUN.
20200706/234942.953 - U00011183 Client '4000', RunID '0081848975'. Message 'EXAKTJ' for Agent 'AGENT_NAME1' was acknowledged with error code 63 (not connected). Job status will be changed to READY_FOR_RUN.
20200706/234942.985 - U00011183 Client '4000', RunID '0081848971'. Message 'EXAKTJ' for Agent 'AGENT_NAME1' was acknowledged with error code 63 (not connected). Job status will be changed to READY_FOR_RUN.
20200706/234942.997 - U00000063 Partner 'AGENT_NAME1' is not connected to the server.

4.) PWP will also show many lines like this:
U00011175 Negative JOB_INF was sent from Agent 'AGENT_NAME1'. Job name 'JOB_NAME' (RunID '0081868437'), old job status 'Start initiated', new job status 'Unknown'

5.) System starts to become more and more unresponsive, MQPWP count pushed up to over 500k, jobs started to get stuck in preparing/generating against that Agent.

Environment

Release : 12.3

Component : AUTOMATION ENGINE

Cause

Agents did not complete connection to the AE correctly, this is causing communication for activation for JOBS to be mishandled by the AE.

Resolution

Resolution

Fixed in AutomationEngine 12.3.6+hf3 or higher

Workaround

1) Note any Agent names listed with the error code 63 or U00011175 Negative JOB_INF and restart them. After restart, attempt to run a job and confirm it is no longer writing messages to the PWP.

2) Here is a possible workaround to monitor for the U00011183 "U code" - please note this is just an example and to put this in place requires the help of someone knowledgeable with Automic scripting:

  1. Create a sqli variable that searches for a U00011183 code that happened in the last hour - give it a name like VARA.SQLI.MONITOR_FOR_U00011183 - the following is an example of sql that coule be used (this is for Oracle, a DBA may be needed to fine tune it):

     select rt_content from rt where rt_ah_idnr in 
     (select ah_idnr from ah where ah_name in 
     (select mqsrv_name from mqsrv where mqsrv_type = 4) 
     and ah_status = 0) 
     and rt_content like concat(concat('%', to_char(sysdate -1/24,'YYYYMMDD/HH24')),'%U00011183%');

  2. Create a script object with the following (note that in line 2 the variable name needs to match what's in step 1 above):

    !Set vara name for the variable being used to monitor for the U00011175 code meaning that agents have gone down without the AE noticing correctly
    :set &monitor_vara_name# = 'VARA.SQLI.MONITOR_FOR_U00011183'
    !set up some starting info
    !set up a new line character - nl
    :set &nl# = UC_CRLF()
    !agent list is going to be a list of all agents that were down - there may be duplicates and
    :set &agent_list# = "The following agents are down (note, there may be some duplicates)&nl#"
    :set &old_agent_name# = ""

    !start processing sqli that brings back all U00011183 lines from RT
    :set &hnd# = prep_process_var(&monitor_vara_name#)
    :process &hnd#
    !  get the full line from the process
    :  set &line# = get_process_line(&hnd#, 1)
    !  set a flag to Y - Y means continue getting the next Agent that is mentioned in the U00011183 line
    :  set &flag# = "Y"
    :  while &flag# = "Y"
    !    find the start of the U code
    :    set &start# = str_find(&line#, "U00011183")
    !    if the start of the U code does not exist, it will come back with 0 - at the point stop looking
    :    if &start# < 1
    :      set &flag# = "N"
    :    else
    :      set &number_of_ticks# = 7
    :        set &new_string# = &line#
    :      while &number_of_ticks# > 0
    !      set the start of the agent name at the seventh tick after the U code plus 1
    :        set &start# = str_find(&line#, "'", &start#)
    :        set &start# = add(&start#, 1)
    :        set &start# = format(&start#)
    :        set &number_of_ticks# = sub(&number_of_ticks#, 1)
    :        set &start# = format(&start#)
    !        get a new string that starts with the agent name after the U code
    :        set &new_string# = substr(&line#, &start#)
    :      endwhile
    !      find the end of the agent name looking for the ' .
    :      set &end# = str_find(&new_string#, "' ")
    :      set &end# = sub(&end#, 1)
    :      set &end# = format(&end#)
    !      agent name is from start of the new string to the end tick
    :      set &agent_name# = substr(&new_string#, 1, &end#)
    !      check new "agent_name" against "old_agent_name"
    :      if &agent_name# <> &old_agent_name#
    !        set the "old_agent_name" vara to attempt to cut down on duplicates
    :        set &old_agent_name# = &agent_name#
    :        p "agent name is &agent_name#"
    :        set &agent_list# = &agent_list#&agent_name#&nl#
    :      endif
    !      reset &line# to everything after the previous U code (that we just found)
    :      set &line# = substr(&line#, &start#)
    :    endif
    :  endwhile
    :endprocess
    :close_process &hnd#

    :p "final list is &agent_list#"
    !&agent_list# can be used to send to notifications from here

  3. Use an ACTIVATE_UC_OBJECT at the end of the above script to send out a notification
  4. Run the script once an hour through an "Execute recurring" action