Any job scheduled with n_retrys attribute is triggering a failure alarm for each retry. Failure alarm used to trigger only after max_retrys but that's not the case now.
Please let us know if this is any kind of new feature or a bug.
Release : 11.3.6
Component : CA Workload Automation AE (AutoSys)
The expected behavior is if the job fails and is restarted and fails again you would
see multiple jobfailure alarms and eventually a max_retrys alarm.
If you say you have seen different behavior then please provide your complete example and exact version.
--- here is my test using 12.0 sp1 ---
/* ----------------- cmd99 ----------------- */
insert_job: cmd99 job_type: CMD
command: command99
machine: host1
owner: autosys@host1
permission:
date_conditions: 0
n_retrys: 3
alarm_if_fail: 1
alarm_if_terminated: 1
$ sendevent -E STARTJOB -J cmd99
$ autosyslog -e
Monitoring AutoSys Workload Automation Scheduler Log:
/opt/CA/WorkloadAutomationAE/autouser.R12/out/event_demon.R12
*** To break out type control-c (^c) ***
[09/02/2022 13:37:06] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:37:06] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:06] CAUAJM_I_10082 [host1 connected for cmd99 110.10816.1]
[09/02/2022 13:37:07] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:07] <Executing at WA_AGENT>
[09/02/2022 13:37:07] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:07] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:07] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:07] <Application FAILURE Restart.>
[09/02/2022 13:37:07] CAUAJM_I_40109 Scheduled [cmd99 110.10816.1] due to RESTART event.
[09/02/2022 13:37:20] CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.201 seconds.
[09/02/2022 13:37:22] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:37:22] <Scheduled due to RESTART event.>
[09/02/2022 13:37:22] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:22] CAUAJM_I_10082 [host1 connected for cmd99 110.10816.2]
[09/02/2022 13:37:23] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:23] <Executing at WA_AGENT>
[09/02/2022 13:37:23] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:23] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:23] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:23] <Application FAILURE Restart.>
[09/02/2022 13:37:23] CAUAJM_I_40109 Scheduled [cmd99 110.10816.2] due to RESTART event.
[09/02/2022 13:37:43] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:37:43] <Scheduled due to RESTART event.>
[09/02/2022 13:37:43] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:43] CAUAJM_I_10082 [host1 connected for cmd99 110.10816.3]
[09/02/2022 13:37:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:44] <Executing at WA_AGENT>
[09/02/2022 13:37:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:44] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:37:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:37:44] <Application FAILURE Restart.>
[09/02/2022 13:37:44] CAUAJM_I_40109 Scheduled [cmd99 110.10816.3] due to RESTART event.
[09/02/2022 13:38:00] ----------------------------------------
[09/02/2022 13:38:09] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:38:09] <Scheduled due to RESTART event.>
[09/02/2022 13:38:09] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:38:09] CAUAJM_I_10082 [host1 connected for cmd99 110.10816.4]
[09/02/2022 13:38:10] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:38:10] <Executing at WA_AGENT>
[09/02/2022 13:38:10] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:38:10] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:38:10] CAUAJM_I_40245 EVENT: ALARM ALARM: MAX_RETRYS JOB: cmd99 MACHINE: host1
[09/02/2022 13:38:10] <Have EXCEEDED the Max # (3) of application restarts.>
If I set alarm_if_fail to 0 then I do not get the jobfailure alarms but I still get the ending max_retrys one.
$ jil
jil>>1> update_job: cmd99
jil>>2> alarm_if_fail: 0
jil>>3> exit
______________________________________________________________________________
CAUAJM_I_50323 Inserting/Updating job: cmd99
CAUAJM_I_50205 Database Change WAS Successful!
______________________________________________________________________________
CAUAJM_I_52301 Exit Code = 0
______________________________________________________________________________
$ autorep -q -J cmd99
/* ----------------- cmd99 ----------------- */
insert_job: cmd99 job_type: CMD
command: command99
machine: host1
owner: autosys@host1
permission:
date_conditions: 0
n_retrys: 3
alarm_if_fail: 0
alarm_if_terminated: 1
$ sendevent -E STARTJOB -J cmd99
$ autosyslog -e
Monitoring AutoSys Workload Automation Scheduler Log:
/opt/CA/WorkloadAutomationAE/autouser.R12/out/event_demon.R12
*** To break out type control-c (^c) ***
[09/02/2022 13:58:26] CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.101 seconds.
[09/02/2022 13:58:40] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:58:40] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:40] CAUAJM_I_10082 [host1 connected for cmd99 110.10827.1]
[09/02/2022 13:58:41] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:41] <Executing at WA_AGENT>
[09/02/2022 13:58:41] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:58:41] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:41] <Application FAILURE Restart.>
[09/02/2022 13:58:41] CAUAJM_I_40109 Scheduled [cmd99 110.10827.1] due to RESTART event.
[09/02/2022 13:58:56] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:58:56] <Scheduled due to RESTART event.>
[09/02/2022 13:58:56] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:56] CAUAJM_I_10082 [host1 connected for cmd99 110.10827.2]
[09/02/2022 13:58:57] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:57] <Executing at WA_AGENT>
[09/02/2022 13:58:57] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:58:57] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:58:57] <Application FAILURE Restart.>
[09/02/2022 13:58:57] CAUAJM_I_40109 Scheduled [cmd99 110.10827.2] due to RESTART event.
[09/02/2022 13:59:00] ----------------------------------------
[09/02/2022 13:59:17] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:59:17] <Scheduled due to RESTART event.>
[09/02/2022 13:59:17] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:17] CAUAJM_I_10082 [host1 connected for cmd99 110.10827.3]
[09/02/2022 13:59:18] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:18] <Executing at WA_AGENT>
[09/02/2022 13:59:18] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:59:18] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RESTART JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:18] <Application FAILURE Restart.>
[09/02/2022 13:59:18] CAUAJM_I_40109 Scheduled [cmd99 110.10827.3] due to RESTART event.
[09/02/2022 13:59:26] CAUAJM_I_80021 The agent inventory service has evaluated the statuses of 2 machine(s) in 0.101 seconds.
[09/02/2022 13:59:43] CAUAJM_I_40245 EVENT: STARTJOB JOB: cmd99
[09/02/2022 13:59:43] <Scheduled due to RESTART event.>
[09/02/2022 13:59:43] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: STARTING JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:43] CAUAJM_I_10082 [host1 connected for cmd99 110.10827.4]
[09/02/2022 13:59:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: RUNNING JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:44] <Executing at WA_AGENT>
[09/02/2022 13:59:44] CAUAJM_I_40245 EVENT: CHANGE_STATUS STATUS: FAILURE JOB: cmd99 MACHINE: host1 EXITCODE: 127
[09/02/2022 13:59:44] CAUAJM_I_40245 EVENT: ALARM ALARM: MAX_RETRYS JOB: cmd99 MACHINE: host1
[09/02/2022 13:59:44] <Have EXCEEDED the Max # (3) of application restarts.>
$ autoflags -a
0028 LINUX ORA 12.0 01.00 fd0a0d81
=================================================================================================================================================
The same behavior exists in 11.3.6 sp8.