Scheduled Jobs Not Processed by the Job Scheduler (BG Service Issue)

book

Article ID: 49034

calendar_today

Updated On:

Products

Clarity PPM On Premise Clarity PPM SaaS

Issue/Introduction

One or more Clarity jobs are stuck in the Waiting, Scheduled, and indefinite Processing status.

The scheduled time shown is stuck in the past. Pausing and resuming the jobs does not work.
Jobs that are ran immediately get stuck in the Scheduled state.

 

Cause

The only known cause is that when one or more Clarity jobs are processing and is killed manually on the database, this causes waiting and scheduled jobs to be stuck indefinitely.

Search for the following error in the logs:

ERROR 2020-09-01 11:09:50,993 [Dispatch Time Slicing : [email protected]<server> (tenant=clarity)] niku.njs (clarity:admin:<session>:Time Slicing) Error updating job in scheduler [email protected]<server>
com.niku.union.persistence.PersistenceException: Error getting a DB connection
 at com.niku.union.persistence.PersistenceController.doProcessRequest(PersistenceController.java:620)
 at com.niku.union.persistence.PersistenceController.processRequest(PersistenceController.java:311)
 at com.niku.njs.SchedulerImpl.processDBRequest(SchedulerImpl.java:1614)
 at com.niku.njs.SchedulerImpl.unlockJobs(SchedulerImpl.java:1266)
 at com.niku.njs.SchedulerImpl.unlockJobs(SchedulerImpl.java:1252)
 at com.niku.njs.SchedulerImpl.completeAndReschedule(SchedulerImpl.java:656)
 at com.niku.njs.Dispatcher$BGTask.run(Dispatcher.java:690)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.sql.SQLException: Connection unavailable
 at com.niku.union.persistence.connection.ApacheContext.getConnection(ApacheContext.java:213)
 at com.niku.union.persistence.PersistenceController.createLocalContext(PersistenceController.java:461)
 at com.niku.union.persistence.PersistenceController.doProcessRequest(PersistenceController.java:569)

Environment

Release: All Supported Clarity releases

Resolution

The following steps will need to be performed in the exact order.

Note: The Clarity administrator performs steps A and D. For SaaS Customers, the Broadcom SaaS team will perform steps B, C and E.

A. Cancel and re-run the jobs
1. Go to Home > Reports and Jobs
2. Pause all jobs and reports in the following states: WAITING, SCHEDULED
3. Cancel all PROCESSING instances.
4. Cancel all NOT SCHEDULED instances.
5. Filter for all CANCELLED  jobs.
Make a note of the Cancelled jobs and take screenshots/notes of their schedule, as they will need to be re-entered at a later time.
6. Select and delete all CANCELLED instances.
7. Run an immediate instance of any fast running job such as Clean User Sessions job
 
If it does not go to the Processing status, delete the job and proceed to step B.
If it does go to the Processing status, proceed to step D.
 
 
B. Remove all possible orphan records and locks on the jobs

1. Stop all background service/deployment on the environment

2. Run the query:
 

select id from cmn_sch_jobs csj

where csj.is_visible = 0 and csj.job_definition_id not in

(select id from cmn_sch_job_definitions where upper(job_code) in

('JOB_CHECK_HEART_BEAT','BPM_ESC_ESCALATION','BPM_ESC_RESCHEDULE_ESC_JOB','TELEMETRY_JOB','PURGE_CSV_DOWNLOADS'))

and csj.status_code in ('WAITING', 'SCHEDULED','PROCESSING');

  • If records are returned, please run the following SQL delete statements:

delete from cmn_sch_jobs csj
where csj.is_visible = 0 and csj.job_definition_id not in
(select id from cmn_sch_job_definitions where upper(job_code) in
('JOB_CHECK_HEART_BEAT','BPM_ESC_ESCALATION','BPM_ESC_RESCHEDULE_ESC_JOB','TELEMETRY_JOB','PURGE_CSV_DOWNLOADS'))
and csj.status_code in ('WAITING', 'SCHEDULED','PROCESSING');

commit;


3. Check for any locks on the scheduler with the query:


select * from prlock where prtablename = 'CMN_SCH_JOBS'

If any records are returned, then run the following SQL statement:
 
delete from prlock where prtablename = 'CMN_SCH_JOBS';
commit;



4. If on CA PPM 15.3 and higher, proceed with Step C, if not, start the Background services back
 
C. Clear the Scheduler Table for Clarity 15.3 and higher 
For version 15.3 and higher, there is a new table that holds all currently processing jobs : CMN_SCH_JOB_CURR_RUNS
Sometimes the table can get out of sync, and here is what you can do to resolve it:
 
1.    Make sure Step A is performed (all processing jobs are stopped in UI )
2.    Stop the background services if not done already (as per Step B)
3.    Run the statement:

select * from CMN_SCH_JOB_CURR_RUNS where job_definition_id != -1; ;

delete from cmn_sch_job_curr_runs where job_definition_id != -1;


4.    Start the background service/deployment

D. Recreate the jobs in Clarity.
1. Re-enter the previously deleted 'Cancelled' jobs with their schedule.
2. Resume(unpause) all 'PAUSED' jobs and reports

Note: For issues with the Time Slicing job being stuck, see also KB000046539

E. Check for jobs stuck in processing in the Database (In rare occasions this is needed)
If the above does not help, check for any jobs stuck in processing in the Database by running the below query. If any jobs are found where the JOB_STATUS is Processing, but the JOB_RUN_STATUS is a different status, delete these jobs from the Scheduled Jobs in the UI and then reschedule the jobs.

Note: It's recommended to take a note of the job reoccurrence setup for the jobs to be deleted for rescheduling after the jobs are deleted. 

select scj.id,scj.name,scj.status_code JOB_STATUS,csj.id RUNID,csj.status_code JOB_RUN_STATUS,(select user_name from cmn_sec_users where id=csj.created_by) from cmn_sch_jobs scj 
join cmn_sch_job_runs csj on csj.job_id=scj.id 
where csj.status_code='PROCESSING'; 

 

Additional Information