search cancel

Inactive jobs no longer working

book

Article ID: 199480

calendar_today

Updated On:

Products

CA Workload Automation iXP

Issue/Introduction

Almost all of our processes have jobs in them that use the "SEND_EVENT" command to change other jobs/boxes to inactive.  These jobs have started failing; when we force start it, it goes to completion, 99% of the time.

I checked the event demon logs and it doesn't show anything other than the fact that the job failed. 

This has been happening all day and is requiring a lot of extra work by our operators because the processes will not complete when these jobs fail.

Here is a snippet from today's event demon log with one example.  I have attached the entire event demon log for today that contains multiple events.

 CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: AFT_ACHORG_ALLOYA064_INACTIVE MACHINE: pauto
 CAUAJM_I_10082 [pautosys01 connected for AFT_ACHORG_ALLOYA064_INACTIVE 414328.1584758.1]
 CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: RUNNING         JOB: AFT_ACHORG_ALLOYA064_INACTIVE MACHINE: pauto
 CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: AFT_ACHORG_ALLOYA064_INACTIVE MACHINE: pauto      EXITCODE:  1
 CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: AFT_ACHORG_ALLOYA064_INACTIVE MACHINE: pauto     EXITCODE:  1
CAUAJM_I_40245 EVENT: FORCE_STARTJOB   JOB: AFT_ACHORG_ALLOYA064_INACTIVE
 <julie.williams:>
CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: SUCCESS         JOB: AFT_ACHORG_ALLOYA064_INACTIVE MACHINE: pauto      EXITCODE:  0

Environment

Release : 11.3

Component : COMMON ASSET API

Cause

There were so many TIME_WAIT for 7163.

Resolution

 

This led to observe so many sockets were not closed by csam after use.

Since the autostatus run by cron sript continuously number sockets and pending TIME_WAITS increase gradually.

To stabilize the environment , in /etc/systemctl.conf , set this parameter "sysctl -n net.ipv4.tcp_tw_reuse".