A batch of DELETEJOB events was issued to a box job and all the jobs inside it using the 'sendevent -f' feature. The DELETEJOB event for the box job failed to process and resulted in the following sequence of errors in the Scheduler log...
[10/20/2020 16:46:19] CAUAJM_I_40245 EVENT: DELETEJOB JOB: test_delete_box
[10/20/2020 16:46:19] CAUAJM_E_10506 Failed to retrieve job with joid <124>.
[10/20/2020 16:46:19] CAUAJM_E_40225 Trouble processing Event [PL1z10000016]!
[10/20/2020 16:46:20] CAUAJM_I_40245 EVENT: ALARM ALARM: EVENT_HDLR_ERROR JOB: test_delete_box
[10/20/2020 16:46:20] <An error occurred while processing event <PL1z10000016> for job [test_delete_box 123.0.0].>
[10/20/2020 16:46:20] CAUAJM_E_40111 Unable to fetch job details using internal job identifier <123>. Event processing for PL1000022w00 aborted.
Release : 11.3.6 / 12.x
Component : CA Workload Automation AE (AutoSys)
When a DELETEJOB event is issued against a box job, the scheduler deletes the box and all the jobs inside it. To do that, it has to look up the jobs that are inside the box, and it does so by joid. In the scenario described above, the sendevent batch also contained DELETEJOB events for the jobs inside it. Since the processing of these batch events take place concurrently, it creates a race condition the box job record in the database still has joid records for jobs inside them that are in the process of deleting. Therefore, when the scheduler thread handling the box job delete tries to look those joid's up in the database, it is unable to do so. This results in the error sequence above.
When deleting a box job and all of the jobs inside it with the DELETEJOB event, it is not necessary to issue separate DELETEJOB events for each job inside the box. To avoid the scenario described above, only send a DELETEJOB to the box job itself, which will take care of the jobs inside it as well.