We had an issue last month where many autosys jobs were waiting on resources.
We ended up fixing the problem, but quite honestly, we cannot 'explain' why the problem occurred.
we had many jobs, such as this job:
insert_job: PRD_EDPP_CMD_IP_PPSVC_CCSRV2_PI job_type: cmd
description: "Entity:C_CSR_V2 ingestion through Podium"
machine: xx2
condition: s(PRD_EDPP_CMD_IP_PPSVC_COMMERCIALVERF_PP)
resources: (AWM_ING_PODIUM_RP,QUANTITY=1,FREE=A)
which were waiting on resources for resource name: AWM_ING_PODIUM_RP
when we looked at the resource in the Resources tab of WCC (to which I don't currently have evidence of) it showed that there were no resources in use.
ie, Amount in Use was 0.
There were about 1300 jobs at the time waiting on this resource, but we couldn't determine why they weren't running when it appeared all threads of the resource were open.
After a while, we found some other jobs which used the above resource AND another resource. ie, this job:
insert_job: PRD_EDPP_CMD_IP_AAP_AAPAPPCRADDRESS_PI job_type: cmd
description: "Entity:AAP_App_crAddress ingestion through Podium"
machine: xx3
resources: (AWM_ING_PODIUM_RP,QUANTITY=1,FREE=A) AND (AWM_ING_PODIUM_AAP,QUANTITY=1,FREE=Y)
as you can see, the resource in question (AWM_ING_PODIUM_RP) is used in this job with an AND condition of a second resource called AWM_ING_PODIUM_AAP. in this case, since AWM_ING_PODIUM_AAP is being used with FREE=Y, the jobs were failing and not releasing the threads for resource AWM_ING_PODIUM_AAP.
I cant explain why, but I thought this may be the issue. So what we did was manually free the threads of resource AWM_ING_PODIUM_AAP. By doing so, the other jobs which were waiting on resource AWM_ING_PODIUM_RP began executing.
While that worked, we just can't explain WHY it worked.
why a job like this:
insert_job: PRD_EDPP_CMD_IP_PPSVC_CCSRV2_PI job_type: cmd
description: "Entity:C_CSR_V2 ingestion through Podium"
machine: xx2
condition: s(PRD_EDPP_CMD_IP_PPSVC_COMMERCIALVERF_PP)
resources: (AWM_ING_PODIUM_RP,QUANTITY=1,FREE=A)
which is using resource AWM_ING_PODIUM_RP only, was not running and waiting on this resource which had threads available, only because another job was holding a different resource.
We are assuming something along the lines because of the way the resources were AND'd together in job:
insert_job: PRD_EDPP_CMD_IP_AAP_AAPAPPCRADDRESS_PI job_type: cmd
description: "Entity:AAP_App_crAddress ingestion through Podium"
machine: xx3
resources: (AWM_ING_PODIUM_RP,QUANTITY=1,FREE=A) AND (AWM_ING_PODIUM_AAP,QUANTITY=1,FREE=Y)
Release : 12.0
Component : CA Workload Automation AE (AutoSys)
The jobs did not run as they had a dependency on a resource which was also needed by a different HIGHER PRIORITY job which was stuck in RESWAIT due to its other resources.
The solution/work around is to adjust the priority of the waiting jobs to be better or equal to the blocking job.
Use autorep -Q ALL -d to review what jobs are waiting on what resources.
https://techdocs.broadcom.com/us/en/ca-enterprise-software/intelligent-automation/autosys-workload-automation/12-0/scheduling/ae-scheduling/real-and-virtual-machines/queuing-jobs/how-ca-workload-automation-ae-queues-jobs.html
And the specific excerpt we are focused on:
The scheduler queues jobs with virtual resource dependencies (whether global or machine-based) based on “like” resource names. A job in the RESWAIT state for one resource name automatically blocks all the lower priority jobs that specify the same resource name. It does not automatically block higher or equal priority jobs that specify the same resource name or a job that specifies a different resource name. The scheduler schedules the job as long as the resources are available.