Job Failure alarm on "<Submission error>". What is causing this?
search cancel

Job Failure alarm on "<Submission error>". What is causing this?

book

Article ID: 11441

calendar_today

Updated On:

Products

CA Workload Automation AE - Business Agents (AutoSys) CA Workload Automation AE - Scheduler (AutoSys) Workload Automation Agent CA Workload Automation DE - System Agent (dSeries)

Issue/Introduction

Job Failure alarm on <Submission error>. What is causing this?

Running a job on an agent I see the following error:

RUNNING 05/06/2016 08:26:23 1 PD 05/06/2016 08:26:24 xxxxxx

FAILURE 05/06/2016 08:26:23 1 PD 05/06/2016 08:26:24 

<Submission error> 

[*** ALARM ***] 

JOBFAILURE 05/06/2016 08:26:24 1 PD 05/06/2016 08:26:24 xxxxxxxx

[CHK_MAX_ALARM] 05/06/2016 08:56:23 1 PD 05/06/2016 08:26:24 xxxxxxx

 

In the agent joblog I see:

FRI May 06 12:00:03 2015 CAWA_E_20044 Could not setuid(XXX): error 11 (Resource temporarily unavailable).Error code: 11 

What is causing the error?

Environment

11.3.x Agent 11.3.x Autosys AE and above

Resolution

The Agent normally runs as 'root' and when it runs jobs it does a setuid to the owner of the job in order to run the command. 

If the OS does not allow setuid to run successfully, you get the error. 

What are the ulimit settings of the root user? These are inherited by the job when running even as a another user. 

What are the values of the ulimit? These can be seen with command such as ulimit -a and ulimit -l etc... 

The setuid error can happen if the nproc limit has been reached. 

The System Agent inherits the ulimit values of the user starting it (either using unisrvcntr or /etc/init.d/waae_agent* script). The 

System Agent is started as root user. 

Increasing the max user process (ulimit -u) for root user to 4096/unlimited and restarting the System Agent resolved the issue. 

You also may want to set the following as well: ulimit -n 65535 and ulimit -l unlimited