ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

DQM logical queue freezes and stops submitting Jobs

book

Article ID: 241250

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

In the context of DQM with logical queue on Node A pointing to a physical queue on Node B, the jobs are submitted into Node A to the logical queue.

Randomly, one of the Logical Queues (a different one every night) "freezes" and stop submitting the Jobs to the Physical Queue where the Logical Queue points to.
All the jobs submitted around this time in this queue remain in status Pending and are not sent to the remote Physical queue node. 
Other queues around the same time continue working fine.

Example of an occurrence:

The only errors that appear in universe.log are the following

a) On logical queue node:
| 2022-02-23 01:11:28 |ERROR|X|DQM|pid=11978.140195785111296| u_dqm_cli_thread_trt | new client authentication failed:


b) On physical queue node:
| 2022-02-23 01:11:28 |ERROR|X|DQM|pid=5701708.17220| k_handshakeAuthent | u_req_serv to []/[logicalqueuenode] in error [-2]
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| k_connect_auth | Request authentication to []/[logicalqueuenode] in error [-1] (check parameter timeout for UVMS connexion )
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| owls_connect_auth | k_connect_auth_timeout(logicalqueuenode/DQM) returns error [205]
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| o_callsrv_connect_r | Connection error 0 [Comlayer error]

 

Cause

Defect

Environment

Release : 6.x

Component : DOLLAR UNIVERSE

Context: Jobs submitted to a Logical Queue that points to a remote Physical Queue defined in a different node.

Resolution

Workaround:

To unblock the situation, simply launch a new job to the impacted queue, all the Pending jobs will be resubmited automatically as soon as this is done.

Solution:

This problem is currently being worked on by Engineering and the planned fix delivery method will be communicated ASAP

Attachments