A job is continuously trying to update autosys database and failing.
When we tried to autorep the job it did not find any results.
$ autorep -j TEST_JOB_1
$ autoflags -a
1589 LINUX ORA 11.3.6 SP7 2a0af2b3
[08/31/2020 02:46:31] CAUAJM_E_40225 Trouble processing Event [DEV0bjz6fc00]!
[08/31/2020 02:46:31] CAUAJM_E_10283 Exhausted maximum number of retries for scheduler operation [get_sched_info<423318>]
[08/31/2020 02:46:31] CAUAJM_E_40111 Unable to fetch job details using internal job identifier <423,318>. Event processing for DEV0bjz6fd00 aborted.
[08/31/2020 02:46:31] CAUAJM_I_40245 EVENT: ALARM ALARM: EVENT_HDLR_ERROR JOB: TEST_JOB_1
[08/31/2020 02:46:31] <An error occurred while processing event <DEV0bjz6ey00> for job [TEST_JOB_1 423318.206068755.0].>
[08/31/2020 02:46:31] CAUAJM_E_40225 Trouble processing Event [DEV0bjz6fd00]!
[08/31/2020 02:46:31] CAUAJM_E_10283 Exhausted maximum number of retries for scheduler operation [get_sched_info<423318>]
[08/31/2020 02:46:31] CAUAJM_E_40111 Unable to fetch job details using internal job identifier <423,318>. Event processing for DEV0bjz6fe00 aborted.
[08/31/2020 02:46:31] CAUAJM_I_40245 EVENT: ALARM ALARM: EVENT_HDLR_ERROR JOB: TEST_JOB_1
[08/31/2020 02:46:31] <An error occurred while processing event <DEV0bjz6ez00> for job [TEST_JOB_1 423318.206068755.0].>
[08/31/2020 02:46:31] CAUAJM_E_40225 Trouble processing Event [DEV0bjz6fe00]!
Release : 11.3.6
Component : CA Workload Automation AE (AutoSys)
For unknown reasons a job's definition was damaged in the database.
It was missing from tables such as ujo_job_tree and ujo_job_status.
When an agent tried to send a completion event for the job the scheduler had problems processing it as the ujo_job_status table had no entry for the joid.
The scheduler generated constant alarms for this.
The resolution was:
Stop the scheduler.
In the db run:
delete from ujo_job where joid = 423318;
delete from ujo_job_status where joid = 423318;
delete from ujo_job_tree where joid = 423318;
delete from ujo_job_cond where joid = 423318;
delete from ujo_sched_info where joid = 423318;
delete from ujo_event where joid = 423318;
Perform a cold start of the agent where the job runs from.
Cold start = Stopping the agent, deleting the agent's database & log & spool directories, and restarting the agent.
Restart the scheduler.
Reinsert the job's jil definition.