DQM logical queue freezes and stops submitting Jobs

search cancel

DQM logical queue freezes and stops submitting Jobs

book

Article ID: 241250

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

In the context of DQM with logical queue on Node A pointing to a physical queue on Node B, the jobs are submitted into Node A to the logical queue.

Randomly, one of the Logical Queues (a different one every night) "freezes" and stop submitting the Jobs to the Physical Queue where the Logical Queue points to.
All the jobs submitted around this time in this queue remain in status Pending and are not sent to the remote Physical queue node.
Other queues around the same time continue working fine.

Example of an occurrence:

The only errors that appear in universe.log are the following

a) On logical queue node:
| 2022-02-23 01:11:28 |ERROR|X|DQM|pid=11978.140195785111296| u_dqm_cli_thread_trt | new client authentication failed:


b) On physical queue node:
| 2022-02-23 01:11:28 |ERROR|X|DQM|pid=5701708.17220| k_handshakeAuthent | u_req_serv to []/[logicalqueuenode] in error [-2]
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| k_connect_auth | Request authentication to []/[logicalqueuenode] in error [-1] (check parameter timeout for UVMS connexion )
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| owls_connect_auth | k_connect_auth_timeout(logicalqueuenode/DQM) returns error [205]
| 2022-02-23 01:11:30 |ERROR|X|DQM|pid=5701708.17220| o_callsrv_connect_r | Connection error 0 [Comlayer error]

Environment

Release : 6.x

Component : DOLLAR UNIVERSE

Context: Jobs submitted to a Logical Queue that points to a remote Physical Queue defined in a different node.

Cause

Defect

Resolution

Workaround:

To unblock the situation, simply launch a new job to the impacted queue, all the Pending jobs will be resubmited automatically as soon as this is done.

Solution:

The issue could not longer be reproduced after upgrading to the following fix versions where the DQM Logical/Physical Algorithm had been fixed with DU_AS-6529.

Please update to a fix version listed below or a newer version if available.

Fix version(s):
Component: Application Server (Node)
Dollar Universe 6.10.101 - Available
Dollar Universe 7.0.01 - Available

Feedback

thumb_up Yes

thumb_down No