Prevent Job Backlog Execution After Database Restore

search cancel

Prevent Job Backlog Execution After Database Restore

book

Article ID: 434414

calendar_today

Updated On:

Products

Autosys Workload Automation

Issue/Introduction

When an AutoSys database is restored from a backup (e.g., 2–3 days old) and the Scheduler is started, the event processor identifies a gap between the last processed event and the current system time. It will attempt to "catch up" by executing every missed time-based and condition-based job, which can flood agents and impact applications.

Environment

AutoSys Workload Automation 12.1.x - 24.x
Infrastructure: AWS (RDS or EC2) or On-Premise

Resolution

Step 1: Halt AutoSys Services

Before bringing the restored database online, ensure all services are shut down to prevent the Scheduler from immediately processing the backlog:

Stop the Scheduler (waae_sched)
Stop the Application Server (waae_server)
Stop the Web Server (waae_webserver)

Step 2: Database and Endpoint Verification (AWS Specific)

If the restored RDS snapshot results in a new endpoint URL, update the database connection configurations (typically $AUTOUSER/config.$AUTOSERV on Linux or via the AutoSys Administrator GUI on Windows) before starting services.

Step 3: Prevent "Catch-Up" Avalanche

Choose one of the following officially supported methods:

Method A: Global Auto Hold (Recommended)

Configure the Scheduler to start in Global Auto Hold mode by setting "GlobalAutoHold=1" in the configuration file.
Start the Scheduler service. It will read the missed events and place eligible jobs into "ON_HOLD" status instead of executing them.
Allow the Scheduler to run until it catches up to the current system clock.
Stop the Scheduler, set "GlobalAutoHold=0", and restart it normally.
Cleanup: Manually "FORCE_STARTJOB" needed jobs or change status to "SUCCESS" to satisfy dependencies.

Method B: JIL "Resave" Method

Start only the "Application Server" service (keep the Scheduler stopped).
Export all job definitions: autorep -J ALL -q > all_jobs.jil
Immediately re-insert them: jil < all_jobs.jil
- Note: This forces the database to recalculate next-start times based on the current clock, effectively erasing past queued STARTJOB events.
Start the Scheduler normally

Step 4: Validation

Verify the Scheduler is processing events at the current time using "autorep" and confirm agent connectivity with "autoping".

Feedback

thumb_up Yes

thumb_down No