More jobs are executing than the DQM limit allows
search cancel

More jobs are executing than the DQM limit allows

book

Article ID: 91859

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

The DQM job limit is used to limit the number of jobs executing in parallel.
In some cases, the number of job executions in parallel exceeds the set limit, for example 2 executions instead of 1. 

Another symptom is that the counters displayed for JobExe and JobPend are wrong ( displaying -1 for JobExe).

If this is the case, the number of executions in UVC> Monitoring > Batch Queue Status, in the column "Pending" displays 1 instead of 0 when no Jobs are queued.

IMPORTANT: When no jobs are executing in the queue, the number in the column "Running" shows -1 (minus one) . 
Or, in the command line the command "uxlstque or uxshwque queue=<queue_name> full" show -1 in the column JOBEXE.

Example:
In the command line uxshwque, we can observe the issue ( counter -1): 

user@server:/automic/DUAS/node/bin#./uxshwque queue=SYS_BATCH
Queue SYS_BATCH
      JobLim 150
      JobQue 0 , JobExe -1 , JobHld 0 , JobPend 1

user@server:/automic/DUAS/node/bin#./uxlstque queue=SYS_BATCH

QUEUE NAME                      TYPE STA  JOBLIM  JOBQUE  JOBEXE  JOBHLD JOBPEND
--------------------------------------------------------------------------------
SYS_BATCH                       PHYS ON      150       0      -1       0       1

In UVC - Batch Queues Status, the counter Pending displays 1:

In UVC - Queued Jobs, no Pending Jobs are displayed :

Another symptom is that the counter JobPend displays a wrong (higher) value.

Environment

Dollar Universe node using DQM Job Limits.

Cause

A problem has been fixed where DQM queue counters are wrong (executing is -1 or pending wrongly high) if job terminations and job submissions occur at the same time.

Resolution

In case the message As a workaround, reinitialize the affected DQM queue/s. 

1. In the command line execute the following command after having loaded the Dollar Universe environment:
uxresetque queue=<queue_name> 


In order to avoid the issue from happening again, increase the "DQM send cycle" value on the node where the Logical queue resides: 
Node Settings - DQM Settings:

DQM send cycle  ->  120 -> increase to 864000

Save and restart the node to take into account the modifications.


Solution: Update to a fix version listed below or a newer version if available.

Fix version(s): 
Component: Application.Server
Dollar Universe 6.10.01 - released June 2019

Additional Information

Since version 6.10.01, the following kind of Error will be displayed in the log to inform about "negative counter" problem, nothing to worry about:
|ERROR|X|DQM|pid=p.t| u_dqm_upd_numbers_que     | u_dqm_end_job: [ENTRY:XXXX] Nbexe=(-1) is negative for queue: NAME_OF_QUEUE

In case this occurs, stop the Launcher when no Jobs are Running and reset the impacted queue as explained in the Workaround.