When performing a Scheduler failover using the sendevent command, the process takes several minutes to complete․
This delay occurs before the shadow scheduler fully takes over and resumes job processing․
You may observe a lag time longer than expected during testing․
SYMPTOMS:
Failover takes several minutes to complete
Lag time observed during EP rollover
CONTEXT: Testing Scheduler Failover with command: sendevent -e stop_demon -v failover
AutoSys Workload Automation (AutoSys) 12.X, 24,X
Operating System: [Platform Independent]
EXPLANATION:
During a failover, the shadow scheduler must check the status of all defined agents․
It sends an update to every agent to inform them that it is the new active scheduler․
This process typically takes a few minutes to propagate․
If the environment contains many agents that are offline, missing, or unreachable, the process slows down significantly due to connection timeouts and retries․
STEPS:
CLEAN UP MACHINE DEFINITIONS
Review the current machine definitions in the environment․
Identify agents that are:
Decommissioned
Permanently offline
Unreachable
Remove or update these definitions to ensure the scheduler only attempts to contact active agents․
EXPECTED: Reduced failover time as the scheduler contacts fewer unreachable agents․
CAPTURE DEBUG LOGS (IF DELAY PERSISTS)
If the delay remains excessive after cleanup, enable debug mode logging on both the primary and shadow schedulers during a failover test․
Review logs to identify specific timeouts or bottlenecks․