Job Failure alarm on <Submission error>. What is causing this?
Running a job on an agent I see the following error:
RUNNING 05/06/2016 08:26:23 1 PD 05/06/2016 08:26:24 xxxxxx
FAILURE 05/06/2016 08:26:23 1 PD 05/06/2016 08:26:24
[*** ALARM ***]
JOBFAILURE 05/06/2016 08:26:24 1 PD 05/06/2016 08:26:24 xxxxxxxx
[CHK_MAX_ALARM] 05/06/2016 08:56:23 1 PD 05/06/2016 08:26:24 xxxxxxx
In the agent joblog I see:
FRI May 06 12:00:03 2015 CAWA_E_20044 Could not setuid(XXX): error 11 (Resource temporarily unavailable).Error code: 11
What is causing the error?
11.3.x Agent 11.3.x Autosys AE and above
The Agent normally runs as 'root' and when it runs jobs it does a setuid to the owner of the job in order to run the command.
If the OS does not allow setuid to run successfully, you get the error.
What are the ulimit settings of the root user? These are inherited by the job when running even as a another user.
What are the values of the ulimit? These can be seen with command such as ulimit -a and ulimit -l etc...
The setuid error can happen if the nproc limit has been reached.
The System Agent inherits the ulimit values of the user starting it (either using unisrvcntr or /etc/init.d/waae_agent* script). The
System Agent is started as root user.
Increasing the max user process (ulimit -u) for root user to 4096/unlimited and restarting the System Agent resolved the issue.
You also may want to set the following as well: ulimit -n 65535 and ulimit -l unlimited