As per enhancements made to the r12.0 Workload Automation System Agent, it no longer shuts down on Linux servers when the minimum space threshold in its file system is breached:
However, even though the machine remains online and the Agent does go into persistent memory mode, when there is no space left (zero bytes) any jobs which try and run get stuck in STARTING state. The jobs remain stuck in a starting state even when space is made available again and the status has to be manually changed on the jobs before they will run again.
Messages such as this is seen in the $AUTOUSER/out/event_demon.$AUTOSERV log when the jobs get stuck in STARTING:
<COMM_ERR_14 Agent on machine [agent_host] has not acknowledged this job request. Please investigate the status of this job.>
The Agent is unable to acknowledge the job request because it cannot write to its <agent_install_dir>/database directory to track the jobs coming in.
Release : 12.0
Component : CA Workload Automation System Agent
This is working as designed. The Scheduler doesn't get acknowledgement from the Agent, so the job remains in starting state and requires manual intervention.
The Agent is able to switch to persistent memory mode when dropping below the critical threshold, but if the FS is completely depleted to 0 bytes, the Agent will not be able to acknowledge new jobs and they will end up stuck in starting with manual intervention needed.
The idea around this new feature is to allow the Agent to run a little longer while action is taken to free space up to no longer breach the critical threshold. It can recover once space is freed and you can run jobs again... but it is not meant to be able to acknowledge jobs and queue the up or run them later if you actually hit 0 bytes.