Getting lost control errors for job when an agent host is rebooted..
search cancel

Getting lost control errors for job when an agent host is rebooted..

book

Article ID: 214556

calendar_today

Updated On:

Products

CA Workload Automation AE - Scheduler (AutoSys)

Issue/Introduction

Getting lost control errors for job when an agent host is rebooted.

Environment

Release : 3.59

Component : CA Workload Automation Database Agent

Resolution

If the system is rebooted a FW job and command job will send back a lost control
error and jobfailure upon restart.  This will happen regardless of what we set 
oscomponent.noguardianprocess equal to.

Only if we also set persistence.coldstart=true could we prevent
the lost control / failure from being sent after a system restart
for the FW and command jobs.

FileTrigger jobs are handled differently.  They monitor for the file via a thread
which is internal to the main agent.  So when the agent is restarted, even after a reboot,
it can resume monitoring for the file and eventually run to success, assuming the file 
shows up and becomes stable.

NOTE - if you set persistence.coldstart=true then the FT job and FW and COMMAND jobs would
all just be completely ignored upon restart.  Meaning NO status would be sent back and 
NONE of them would resume running upon restart of the agent.  You would need to manually 
send some ending status to get them to reschedule for their next start time.

Users can issue a sendevent -E KILLJOB against the job to terminate the running process on the agent before agent host reboots.

That would have avoided the lost control events when the agent is restarted.