Autosys jobs/resources issue - job stuck in reswait when resources seemed to be available.

search cancel

Autosys jobs/resources issue - job stuck in reswait when resources seemed to be available.

book

Article ID: 240816

calendar_today

Updated On:

Products

Autosys Workload Automation CA Workload Automation AE - Scheduler (AutoSys)

Issue/Introduction

We had an issue last month where many autosys jobs were waiting on resources.

We ended up fixing the problem, but quite honestly, we cannot 'explain' why the problem occurred.

we had many jobs, such as this job:

insert_job: test-job1 job_type: cmd
machine: xx2
condition: s(test-job2)
resources: (test-resource,QUANTITY=1,FREE=A)

which were waiting on resources for resource name: test-resource

when we looked at the resource in the Resources tab of WCC (to which I don't currently have evidence of) it showed that there were no resources in use.

ie, Amount in Use was 0.

There were about 1300 jobs at the time waiting on this resource, but we couldn't determine why they weren't running when it appeared all threads of the resource were open.

After a while, we found some other jobs which used the above resource AND another resource. ie, this job:

insert_job: testjob3 job_type: cmd
machine: xx3
resources: (test-resource,QUANTITY=1,FREE=A) AND (test-resource2,QUANTITY=1,FREE=Y)

as you can see, the resource in question (test-resource) is used in this job with an AND condition of a second resource called test-resource2. in this case, since test-resource2 is being used with FREE=Y, the jobs were failing and not releasing the threads for resource test-resource2.

I cant explain why, but I thought this may be the issue. So what we did was manually free the threads of resource test-resource2. By doing so, the other jobs which were waiting on resource test-resource began executing.

While that worked, we just can't explain WHY it worked.

why a job like this:

insert_job: test-job1 job_type: cmd
machine: xx2
condition: s(test-job2)
resources: (test-resource,QUANTITY=1,FREE=A)

which is using resource test-resource only, was not running and waiting on this resource which had threads available, only because another job was holding a different resource.

We are assuming something along the lines because of the way the resources were AND'd together in job:

insert_job: testjob3 job_type: cmd
machine: xx3
resources: (test-resource,QUANTITY=1,FREE=A) AND (test-resource2,QUANTITY=1,FREE=Y)

Environment

Release : 12.0

Component : CA Workload Automation AE (AutoSys)

Resolution

The jobs did not run as they had a dependency on a resource which was also needed by a different HIGHER PRIORITY job which was stuck in RESWAIT due to its other resources.

The solution/work around is to adjust the priority of the waiting jobs to be better or equal to the blocking job.

Use autorep -Q ALL -d to review what jobs are waiting on what resources.

https://techdocs.broadcom.com/us/en/ca-enterprise-software/intelligent-automation/autosys-workload-automation/12-0/scheduling/ae-scheduling/real-and-virtual-machines/queuing-jobs/how-ca-workload-automation-ae-queues-jobs.html

And the specific excerpt we are focused on:
The scheduler queues jobs with virtual resource dependencies (whether global or machine-based) based on “like” resource names. A job in the RESWAIT state for one resource name automatically blocks all the lower priority jobs that specify the same resource name. It does not automatically block higher or equal priority jobs that specify the same resource name or a job that specifies a different resource name. The scheduler schedules the job as long as the resources are available.

Feedback

thumb_up Yes

thumb_down No