CCA Jobs stuck not completing or hanging

Products

CA Configuration Automation CENDURA

Issue/Introduction

Discovery, Management Profiles, Network Profile jobs does not complete or hanging

Environment

Configuration Automation - All versions

Resolution

Navigate to the Network tab --> Jobs --> check the Job(s) --> Select Actions --> Cancel Job (make take a few seconds to kick in)

If that doesn't work, you can recycle the Grid Node(s) services which will kill the job; however, the CCA Server won't know the job is done\completed\dead until the next heartbeat (default i believe is 5 or 10 minutes).

This means the CCA UI won't show the new job status for around 10 mins after your recycle the Grid Node(s). If you do not have any Grid Nodes, then the same concept applies, just recycle the CCA Server Service.

If the problem still persists, then the stalled job(s) needs to be removed from the CCA database.

1. Before running the above delete queries, make sure to stop all the CCA services, then take a full
backup of the CCA database:

- net stop candgateway
- net stop candserver
- net stop ccagridnode
- net stop ccaserver

- Take a full backup of all CCA services

2. Login into CCA database and execute the queries

select * from ACMQ_BLOB_TRIGGERS

select * from ACMQ_TRIGGERS

select * from ACMQ_JOB_DETAILS

3. Identify the stale Jobs which are still available in the tables of Step 2.

4. Execute the delete queries in the following order. (you can decide for which Provide the management profile name which you got in select query you don t want and can be deleted.)

delete from ACMQ_BLOB_TRIGGERS where TRIGGER_NAME like '%Provide the management profile name which you got in select query%'

delete from ACMQ_TRIGGERS where TRIGGER_NAME like '%Provide the management profile name which you got in select query %'

delete from ACMQ_JOB_DETAILS where JOB_NAME like '%Provide the management profile name which you got in select query %'

5. OPTIONAL: Also verify the below using this query:

select * from acm_grid_operation where operation_state in (1, 2, 4, 6, 8)

Operation states:

1: dispatched but not started

2: running

4: being cancelled

6: aborted

8: waiting on dependencies

If you get any entries of above states then look at the server description available in descr column. By inspection you can notice the problematic servers on which the stale jobs are still referencing.

6. Delete the stale job entries using the query :

delete from acm_grid_operation where operation_state in (1, 2, 4, 6, 8)

7. Start the CCA Server Service first, followed by any CCA Grid Nodes

- net start candserver
- net start candgateway
- net start ccagridnode
- net start ccaserver