Unexpected failure alarm for every retry in N_retrys attribute
search cancel

Unexpected failure alarm for every retry in N_retrys attribute

book

Article ID: 249432

calendar_today

Updated On:

Products

CA Workload Automation AE

Issue/Introduction

Any job scheduled with n_retrys attribute is triggering a failure alarm for each retry. Failure alarm used to trigger only after max_retrys but that's not the case now.

Please let us know if this is any kind of new feature or a bug.

Environment

Release : 11.3.6

Component : CA Workload Automation AE (AutoSys)

Resolution

The expected behavior is if the job fails and is restarted and fails again you would 
see multiple jobfailure alarms and eventually a max_retrys alarm.

If you say you have seen different behavior then please provide your complete example and exact version.

--- here is my test using 12.0 sp1 ---

/* ----------------- cmd99 ----------------- */
insert_job: cmd99   job_type: CMD
command: command99
machine: host1
owner: autosys@host1
permission:
date_conditions: 0
n_retrys: 3
alarm_if_fail: 1
alarm_if_terminated: 1
$ sendevent -E STARTJOB -J cmd99
$ autosyslog -e


Monitoring AutoSys Workload Automation Scheduler Log:
        /opt/CA/WorkloadAutomationAE/autouser.R12/out/event_demon.R12

        *** To break out type control-c (^c) ***

[09/02/2022 13:37:06]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:37:06]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:06]      CAUAJM_I_10082 [host1 connected for cmd99 110.10816.1]
[09/02/2022 13:37:07]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:07]      <Executing at WA_AGENT>
[09/02/2022 13:37:07]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:07]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:07]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:07]      <Application FAILURE Restart.>
[09/02/2022 13:37:07]      CAUAJM_I_40109 Scheduled [cmd99 110.10816.1] due to RESTART event.
[09/02/2022 13:37:20]      CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.201 seconds.
[09/02/2022 13:37:22]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:37:22]      <Scheduled due to RESTART event.>
[09/02/2022 13:37:22]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1

[09/02/2022 13:37:22]      CAUAJM_I_10082 [host1 connected for cmd99 110.10816.2]
[09/02/2022 13:37:23]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:23]      <Executing at WA_AGENT>
[09/02/2022 13:37:23]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:23]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:23]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:23]      <Application FAILURE Restart.>
[09/02/2022 13:37:23]      CAUAJM_I_40109 Scheduled [cmd99 110.10816.2] due to RESTART event.
[09/02/2022 13:37:43]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:37:43]      <Scheduled due to RESTART event.>
[09/02/2022 13:37:43]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:43]      CAUAJM_I_10082 [host1 connected for cmd99 110.10816.3]
[09/02/2022 13:37:44]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:44]      <Executing at WA_AGENT>
[09/02/2022 13:37:44]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:44]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:37:44]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:37:44]      <Application FAILURE Restart.>
[09/02/2022 13:37:44]      CAUAJM_I_40109 Scheduled [cmd99 110.10816.3] due to RESTART event.
[09/02/2022 13:38:00]      ----------------------------------------
[09/02/2022 13:38:09]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:38:09]      <Scheduled due to RESTART event.>
[09/02/2022 13:38:09]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:38:09]      CAUAJM_I_10082 [host1 connected for cmd99 110.10816.4]
[09/02/2022 13:38:10]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:38:10]      <Executing at WA_AGENT>
[09/02/2022 13:38:10]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:38:10]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:38:10]      CAUAJM_I_40245 EVENT: ALARM            ALARM: MAX_RETRYS       JOB: cmd99           MACHINE: host1
[09/02/2022 13:38:10]      <Have EXCEEDED the Max # (3) of application restarts.>


If I set alarm_if_fail to 0 then I do not get the jobfailure alarms but I still get the ending max_retrys one.

$ jil
jil>>1> update_job: cmd99
jil>>2> alarm_if_fail: 0
jil>>3> exit
______________________________________________________________________________

CAUAJM_I_50323 Inserting/Updating job: cmd99
CAUAJM_I_50205 Database Change WAS Successful!
______________________________________________________________________________

CAUAJM_I_52301 Exit Code = 0
______________________________________________________________________________

$ autorep -q -J cmd99


/* ----------------- cmd99 ----------------- */

insert_job: cmd99   job_type: CMD
command: command99
machine: host1
owner: autosys@host1
permission:
date_conditions: 0
n_retrys: 3
alarm_if_fail: 0
alarm_if_terminated: 1
$ sendevent -E STARTJOB -J cmd99
$ autosyslog -e


Monitoring AutoSys Workload Automation Scheduler Log:
        /opt/CA/WorkloadAutomationAE/autouser.R12/out/event_demon.R12

        *** To break out type control-c (^c) ***

[09/02/2022 13:58:26]      CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.101 seconds.
[09/02/2022 13:58:40]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:58:40]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:40]      CAUAJM_I_10082 [host1 connected for cmd99 110.10827.1]
[09/02/2022 13:58:41]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:41]      <Executing at WA_AGENT>
[09/02/2022 13:58:41]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:58:41]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:41]      <Application FAILURE Restart.>
[09/02/2022 13:58:41]      CAUAJM_I_40109 Scheduled [cmd99 110.10827.1] due to RESTART event.
[09/02/2022 13:58:56]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:58:56]      <Scheduled due to RESTART event.>
[09/02/2022 13:58:56]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:56]      CAUAJM_I_10082 [host1 connected for cmd99 110.10827.2]
[09/02/2022 13:58:57]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:57]      <Executing at WA_AGENT>
[09/02/2022 13:58:57]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:58:57]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:58:57]      <Application FAILURE Restart.>
[09/02/2022 13:58:57]      CAUAJM_I_40109 Scheduled [cmd99 110.10827.2] due to RESTART event.
[09/02/2022 13:59:00]      ----------------------------------------
[09/02/2022 13:59:17]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:59:17]      <Scheduled due to RESTART event.>
[09/02/2022 13:59:17]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:17]      CAUAJM_I_10082 [host1 connected for cmd99 110.10827.3]
[09/02/2022 13:59:18]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:18]      <Executing at WA_AGENT>
[09/02/2022 13:59:18]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:59:18]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RESTART         JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:18]      <Application FAILURE Restart.>
[09/02/2022 13:59:18]      CAUAJM_I_40109 Scheduled [cmd99 110.10827.3] due to RESTART event.
[09/02/2022 13:59:26]      CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.101 seconds.
[09/02/2022 13:59:43]      CAUAJM_I_40245 EVENT: STARTJOB         JOB: cmd99
[09/02/2022 13:59:43]      <Scheduled due to RESTART event.>
[09/02/2022 13:59:43]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:43]      CAUAJM_I_10082 [host1 connected for cmd99 110.10827.4]
[09/02/2022 13:59:44]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:44]      <Executing at WA_AGENT>
[09/02/2022 13:59:44]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: cmd99           MACHINE: host1 EXITCODE:  127
[09/02/2022 13:59:44]      CAUAJM_I_40245 EVENT: ALARM            ALARM: MAX_RETRYS       JOB: cmd99           MACHINE: host1
[09/02/2022 13:59:44]      <Have EXCEEDED the Max # (3) of application restarts.>

$ autoflags -a
0028 LINUX ORA 12.0 01.00 fd0a0d81 
=================================================================================================================================================

The same behavior exists in 11.3.6 sp8.