IN-PROGRESS task issues - A Client’s Guide to Understanding and Resolving

Products

CA Identity Manager CA Identity Suite

Issue/Introduction

Tasks hung in an In-Progress state and not completing their work is a most common problem that almost every client will encounter at some point in their usage of IDM.

Task execution is the primary method through which IDM completes work. When tasks are hanging in progress and not completed this has a high overall impact on the Identity Manager product.

There are a large number of potential causes of tasks backing up In Progress and it is often difficult to determine where in the product the problem is. This document details the most common causes of in-progress tasks and how to resolve these types of issues.

Task persistence was never meant to be used for Reporting or historical data usage. It is used to report on tasks that have been submitted are in progress and have completed with success or completed with error.

Every time that you submit a task to IM an OID is generated for that task, but MUST be unique, so before that task is submitted to task persistence the OID is searched amongst the task persistence table to see if it exists. So the less data in the tables the faster the searches are.

The information in this document will help resolve some of the most common causes of tasks being stuck In-Progress.

Environment

Identity Manager and Virtual Appliance (IM) 14.x

Cause

Beginning the Investigation:

View Submitted Tasks and the Task Run-Time Management Task Persistence Monitor will provide valuable insight into the extent of the problem and should point to an initial area of focus.

If for example only Tasks related to Active Directory are stuck In-Progress, the focus should quickly be put on the Provisioning/endpoint layer; a situation where all tasks are hung In Progress would lead to more global areas such as the JMS queue or the Task Persistence database; if all tasks are In-Progress, or the overall user interface is poorly performing this might direct you to overall engine tuning, or on the database itself such as index statistics.

Main causes of In-progress tasks:

Patch Level
There are numerous causes of in-progress tasks that have been identified and patched out of the software.

JMS Health
- JMS is the messaging engine through which Tasks are processed by the application server and ultimately written into the Database. This is listed first as it is one of the simplest problems to locate using the Task Persistence Monitor feature, and the simplest to resolve.

Load / Environmental performance tuning
- The second most common cause of this is not properly tuning the environment initially or adding new load into an existing environment without adjusting the tuning configuration

Database Health
- Generally, the most common cause of In-Process tasks is too much information in the Task Persistence Database Tables. The Task Persistence database contains the runtime tables of the Identity Manager product. The Task Persistence tables are where all Task work is stored throughout the lifetime of the Task’s execution and is constantly being written, read, and updated. A large row count in these tables means each update takes a longer amount of time which over time will slow all task execution down and lead to stoppages of Task execution leaving tasks in the In-Progress State.

Provisioning / Endpoint issues
-Problems such as unavailable endpoints, administrator password changes, underlying services stopped can all lead to Tasks not being able to complete and in many cases remaining in the In Progress state.

Cross-cluster communications. Multiple default cluster configurations running on the same subnet.

Resolution

Patch Level

Support and Engineering are continuously identifying and resolving potential causes of In-Progress Tasks. These fixes are then rolled into our publicly available Cumulative Patches and Cumulative Hotfix Fix packs. Ensure that your deployment is fully patched to the most current Cumulative Patches and Cumulative hotfix packs to avoid these known and resolved causes of In-Progress task events.

Please review your installed version's Release Notes on the Broadcom Product Documentation Site for the current publicly available patches.

JMS Health

JMS is the messaging engine through which Tasks are processed by the application server and ultimately written into the Database. JMS is a Application Server feature which Identity Management relies on.

Check Java Messaging Service (JMS) processing for problems
Task Run Time Management Task Message Health? Does the synthetic test complete?

This creates dummy tasks which are pushed through the JMS queue into the database to check JMS queue performance.
If not 100% and returned in a few seconds - clear JMS queue and restart engine.

How to restart the JMS queue:

>Non-VAPP deployments:
For JBOSS / WIldfly, stop the application server, backup and then delete the contents of standalone/data/ and standalone/tmp/
Then restart the app server.
WebSphere / Weblogic:
Please see your application server admin for details on clearing JMS in Weblogic or Websphere.

>For VAPP Deployments:
VAPP includes an Alias to accomplish this: deleteIDMJMSqueue
Deletes the Identity Manager JMS queue (/opt/CA/wildfly-idm/standalone/data/*).

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-suite/14-4/virtual-appliance/administering-virtual-appliance/using-the-login-shell.html

This should be completed on all nodes.

Configure Journal Size

This only applies for 14.3 environments. The below configuration does not apply in 14.4.

For Standalone IM and standalone-full-ha.xml:
The current journal file size and the minimum number of files are the default values, which may not be adequate with a heavy load.

Recommended values:

<journal-file-size>25485760</journal-file-size>
<journal-min-files>20</journal-min-files>

Configuring journal size for Virtual Appliance:

https://knowledge.broadcom.com/external/article?articleId=214890

Load / Environmental performance related issues

Has the environment been tuned or is it running with out of the box configurations? Out of the box configurations work for many of our client's requirements, but can quickly become a bottleneck as environmental complexity and usage grow.
The quickest and simplest tuning option that almost all clients should perform is increasing the memory allocation. See the Tuning and Fine tuning sections of the specific versions Documentation, for 14.4:
https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-manager/14-4/reference/performance-tuning.html

Does the issue only occur during heavy loads? For example right after starting a bulk task or series of bulk tasks or an E&C? Some clients have multiple bulk loads or E&C execution which may have initially been spread out enough but have now started to take long enough to overlap. Increasing the time between each large Task to ensure prior Tasks have time to complete. Log review showing ‘heap’ related or ‘memory related errors such as: java.lang.OutOfMemoryError: Java heap space

Understanding memory heap requirements and HEAP PLANNING KB:
https://knowledge.broadcom.com/external/article?articleId=140353

Database Health

Issues at the database, primarily not cleaning up the completed records in a timely manner, is the most frequent cause of In-Progress tasks.

Start with reviewing Resource usage on DB server?

Is the CPU or memory pegged at 100%?
Run out of disk space?

Get DBA / Server team involved

Check the size of the DB tables:

TP should be under ~100,000 rows for best performance.

select count(*) from tasksession12_5

select count(*) from object12_5

select count(*) from lock12_5

select count(*) from runtimestatusdetail12

100,000 is not a hard value, the 'maximum' number of roles is dependent on the backend database. We do have clients with high performing well tuned database where these tables contain millions of rows.

Cleanup Task Persistence database should be done on a regular ongoing basis to keep the row low.

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-manager/14-4/configuring/task-persistence/monitor-health-of-the-task-persistence-database.html

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-manager/14-4/configuring/task-persistence/cleaning-up-the-object12-5-and-lock12-5-object-store-tables.html

Counts should be returned in milliseconds - if the counts take seconds to return this may indicate overall database issues that should be discussed with DBA. For example data fragmentation, indexes, or a large number of locked tables can cause slowness. DBA should have tools to check this.

Also, check the size of the lock12_5 DB table. The select should return quickly and should be under 2 million records:

select count(*) from lock12_5

If the select count(*) on the lock12_5 is not returning quickly or returning a large value then you will need to stop IM and truncate the lock12_5 table

https://knowledge.broadcom.com/external/article/10771/lock12_5-table-grows-large-andor-causes.html

Database POOLS configuration Tuning:
Symantec Identity Suite - Performance tuning for SQL pool connection sizes

Provisioning / Endpoint issues

Review View Submitted Tasks - is there a pattern? Are we seeing only specific tasks against one endpoint having issues? If the issue seems isolated to one endpoint Open Provisioning Manager - Right Click - can you access a user account information and perform CRUD operations (Create Read Update Delete!) in Provisioning Manager:

Can you test against other endpoints to ensure they are accessible?

If endpoint issues are clearly present, focus on and resolve endpoint issues then attempt to use the built in Resubmit Task option to retry the specific problem tasks.

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-manager/14-4/configuring/resubmit-stuck-in-progress-tasks.html

Review endpoints for failures and resolve endpoint issues
Check Prov logs (etatrans, etanotify, JCS)

Cross-cluster communications (JBoss/Wildfly)

Multiple, default cluster configurations running on the same network can prevent tasks from completing. Shutting down all but one cluster will resolve the issue until both clusters are configured to be isolated.

Isolate JBoss EAP clusters running on the same network:

https://access.redhat.com/solutions/274263

RedHat account is required for access to the above link. Contact your JBoss or Wildfly support for further assistance.

Additional Information

If the above has not resolved your In-progress issue

Please collect all of the below and upload to your new case with L1 support.

Product version
Environment information

vApp # of nodes

Configuration of nodes (what services are where)
Take your time to identify geo clustering issues.

Non vApp
# of App servers and flavor/version

# of Provisioning servers

What Database, version, and location
Is this a new environment?
If this is an existing environment, is this the first time this environment has processed this number of tasks?
When was the last time the entire environment was restarted? If the environment has not been restarted recently a simple recycling of services in the environment may at least temporarily clear the issue.
What is the extent of the problem?
How many tasks are hanging In Progress?
Does this impact All tasks or only specific types of Tasks?
Is this a ‘slowness’ issue, where tasks are completing, just far slower than normal?
Do the stuck tasks complete if they are resubmitted through System > Task Run Time Management > Resubmit Tasks feature?

https://techdocs.broadcom.com/us/en/symantec-security-software/identity-security/identity-manager/14-4/configuring/resubmit-stuck-in-progress-tasks.html