A job is frequently being killed and don't see why other than its box has the n(boxjob), but no term_run_time is specified. It only has a box & job_terminator set.
While looking into the frequent job termination, noticed a strange behavior in AutoSys. This job, along with other jobs show a "connected for KILLJOB <jobname>" entry in the event_demon logfiles, as far back as 2023 (the oldest logs available).
Is this "KILLJOB..." message 'normal'? Need help in determining why this job is being terminated all the time along with understanding this KILLJOB msg being tagged to a jobname during the agent connection msg.
[09/03/2025 21:03:03] CAUAJM_I_10082 [<MACHINE_NAME> connected for KILLJOB xxxxxxxxxxxxxxxa 132.9819.1]
[09/03/2025 21:03:11] CAUAJM_I_10082 [<MACHINE_NAME> connected for KILLJOB xxxxxxxxxxxxxxxb 132.9820.1]
[09/03/2025 21:03:18] CAUAJM_I_10082 [<MACHINE_NAME> connected for KILLJOB xxxxxxxxxxxxxxxc 132.9822.1]
[09/03/2025 21:03:26] CAUAJM_I_10082 [<MACHINE_NAME> connected for KILLJOB xxxxxxxxxxxxxxxd 132.9824.1]
When a self-looping box is terminated (e.g., manually killed or reaches a TERMINATED state due to internal conditions), the scheduler issues KILLJOB events to clean up any outstanding jobs or instances from that specific run. However, if the box is configured to restart immediately upon SUCCESS or TERMINATED status, it can restart before all these KILLJOB events are fully processed. When the scheduler then processes these delayed KILLJOB events, they can inadvertently target and terminate jobs in the new, already-started iteration of the box, leading to a cycle of unexpected looping terminations.
To prevent this, introduce a controlled delay into the box definition. This ensures the scheduler has sufficient time to process all pending KILLJOB events from a previous run before the box's next iteration begins.
Example Box Definition with Delay:
insert_job: 60sec_jobjob_type: cmdcondition: t(outer_box)command: sleep 60machine: localhostdescription: This job runs for 60 seconds when outer_box terminates.
insert_job: outer_boxjob_type: boxcondition: s(outer_box) | d(60sec_job,0)description: This box self-loops on success, or waits for a 60-second delay after termination.
Explanation:
60sec_job: This acts as a delay mechanism. It is configured to run for 60 seconds only when outer_box transitions to a TERMINATED state (t(outer_box)).outer_box condition:
s(outer_box): Allows the box to self-loop immediately if it completes successfully.d(60sec_job,0): If outer_box is terminated, this condition becomes active. The 0 (lookback value) is crucial here; it means outer_box will wait for the next successful completion of 60sec_job.outer_box is terminated, 60sec_job is triggered. outer_box will then pause and wait for 60sec_job to complete its 60-second sleep. This introduced delay provides ample time for the scheduler to process all outstanding KILLJOB events from the previous run of outer_box before the box begins its next iteration, thereby preventing premature termination.