AutoSys jobs stuck in STARTING state continuously cycle into RESTART status
search cancel

AutoSys jobs stuck in STARTING state continuously cycle into RESTART status

book

Article ID: 436486

calendar_today

Updated On:

Products

Autosys Workload Automation

Issue/Introduction

Jobs enter a continuous restart loop without successfully executing.
This frequent cycling creates a race condition where standard management commands are ignored or queued.

Symptoms include:

  • Jobs cycle into RESTART status immediately after entering STARTING.
  • ON_HOLDON_ICE, or INACTIVE commands do not process.
  • Standard KILLJOB commands fail to terminate jobs in these states.
  • Scheduler performance may degrade, affecting other applications.
 

 

Environment

  • AutoSys Workload Automation 12.x
  • AutoSys Workload Automation 24.x

Cause

The issue is caused by an underlying account being locked in combination with extremely short job schedule intervals.
The tight interval triggers a race condition where the scheduler initiates a restart before manual event commands can be processed.

 

 

 

Resolution

Follow these steps to break the restart cycle and regain control of the job:

  1. Strip scheduling attributes via JIL: Create a temporary JIL file (e.g., stop_job.jil) and add the following lines to remove retries and date conditions:

    jil
    update_job: [job_name]n_retries: 0date_conditions: 0

    Note: Alternatively, update the definition with an impossible condition, such as condition: v(fakeGVAR)=65.

  2. Update the job definition: Run the following command to insert the changes into the database:

    bash
    jil < stop_job.jil
  3. Force the job to terminate: Execute the change status event to stop the cycling:

    bash
    sendevent -E CHANGE_STATUS -s TERMINATED -J [job_name]

    Note: If TERMINATED does not work, use -s FAILURE instead.

  4. Place the job on hold: Secure the job in a controlled state for further investigation:

    bash
    sendevent -E CHANGE_STATUS -s ON_HOLD -J [job_name]

Preventative Measure: To prevent future occurrences, modify the MaxRestartTrys parameter in the $AUTOUSER/config.$AUTOSERV file:

text
MaxRestartTrys=1

Additional Information

Possible additional solution:

Modify the MaxRestartTrys=1 in the $AUTOUSER/config.$AUTOSERV.
Then use the SENDEVENT to put the job on hold

Scheduler MaxRestartTrys