search cancel

Detecting Jobs with status FAULT_OTHER

book

Article ID: 89911

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

Jobs that end with status FAULT_OTHER (status 1820) sometimes remain undetected and are therefore hard to handle.

This article explains how to detect Jobs with status FAULT_OTHER and possibly restart them.

Environment

Release: Automic Workload Automation
Component: Automation Engine

Cause

When jobs do not start because the Agent name is not resolved or a password had changed, it gets status FAULT_OTHER (1820). If they are set to automatically deactivate they do not appear in the Activities Windows and these can remain undetected. Only when looking at the statistics you will find them.

Resolution

There are several possible solutions to detect these jobs in case these are business critical:

 

1. On the level of the JOPB you can set a Post Condition that reruns the JOBS ended with FAULT_OTHER or return code 1820.

IF task ended with status FAULT_OTHER

FINALLY restart task in x minutes

 

2. More generally, there is also the possibility to run a watchdog script for surveillance purposes:

The following example script determines if the last run of a job with the name "JOBS.WIN.FAULT_OTHER" got status "FAULT_OTHER" and re-runs the job again if necessary.
For periodical jobs it would be necessary to run the watchdog script also periodical because it checks the status of the job only once!

:set &RUNID# = GET_STATISTIC_DETAIL(, "RUNID", "JOBS.WIN.FAULT_OTHER")
:set &STATUS# = GET_UC_OBJECT_STATUS(JOBS,&RUNID#)
:p "RUNID: &RUNID#  ended with STATUS: &STATUS# "

:if &STATUS# = 1820
! restart job or start a notification process here (in this example the job is being restarted)
:   set &RET# = activate_uc_object(JOBS.WIN.FAULT_OTHER)
:   set &RET2# = activate_uc_object(CALL.ALARM.FAULT.OTHER)
:   Print "RUNID new run: &RET#"
:endif


3. Lastly, the following solution was proposed in the community and is also more general but only detects jobs with status FAULT_OTHER:
https://community.broadcom.com/enterprisesoftware/communities/community-home/digestviewer/viewthread?MessageKey=d4835175-853a-4a67-9d09-b4a569a0d906&CommunityKey=2e1b01c9-f310-4635-829f-aead2f6587c4&tab=digestviewer#bmd4835175-853a-4a67-9d09-b4a569a0d906

In a script run:

:SET &FROMDATE# = SYS_DATE("YYYY-MM-DD")
:SET &TODATE# = SYS_DATE("YYYY-MM-DD")
:PRINT "FROM: &FROMDATE#   TO: &TODATE#"

:SET &HND#=PREP_PROCESS_VAR(VARA.SQLI.GET_1820)
:PROCESS &HND#
:   SET &NAME# = GET_PROCESS_LINE(&HND#,2)
:   SET &STATUS# = GET_PROCESS_LINE(&HND#,3)
:   SET &TIMESTAMP# = GET_PROCESS_LINE(&HND#,4)
:   PRINT "&NAME#  &STATUS#  &TIMESTAMP#"
:ENDPROCESS


This is the SQLI VARA that is called:

select ah_name, ah_status, ah_timestamp4 from ah
where ah_client = &$CLIENT#
and ah_status=1820
and ah_timestamp4 > '&FROMDATE# 00:00:00.000'
and ah_timestamp4 < '&TODATE# 23:59:59.000'