Jobs enter a continuous restart loop without successfully executing.
This frequent cycling creates a race condition where standard management commands are ignored or queued.
Symptoms include:
ON_HOLD, ON_ICE, or INACTIVE commands do not process.KILLJOB commands fail to terminate jobs in these states.
The issue is caused by an underlying account being locked in combination with extremely short job schedule intervals.
The tight interval triggers a race condition where the scheduler initiates a restart before manual event commands can be processed.
Follow these steps to break the restart cycle and regain control of the job:
Strip scheduling attributes via JIL: Create a temporary JIL file (e.g., stop_job.jil) and add the following lines to remove retries and date conditions:
Note: Alternatively, update the definition with an impossible condition, such as condition: v(fakeGVAR)=65.
Update the job definition: Run the following command to insert the changes into the database:
Force the job to terminate: Execute the change status event to stop the cycling:
Note: If TERMINATED does not work, use -s FAILURE instead.
Place the job on hold: Secure the job in a controlled state for further investigation:
Preventative Measure: To prevent future occurrences, modify the MaxRestartTrys parameter in the $AUTOUSER/config.$AUTOSERV file:
Possible additional solution:
Modify the MaxRestartTrys=1 in the $AUTOUSER/config.$AUTOSERV.
Then use the SENDEVENT to put the job on hold
Scheduler MaxRestartTrys