Jobs showed as "aborted" after launch, while they are still running on remote node.

book

Article ID: 86191

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

Error Message :
In UVC : 
Long time "pending" jobs (30 to 120 sec).
Jobs showed as "aborted" short after launch, while they are still running on remote node.

In universe.log for node handling logical queue (example, timestamps, node ID and node name can differ) : 
##################
| 2013-01-17 12:17:03 |ERROR|X|IO |pid=2200.4272| o_io_cache_data_provider_ | Error getting Node <NBK971BTE002>: 600
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| o_connect_auth | o_io_api_getserv error: unable to get service [SIO]/[X]
| 2013-01-17 12:17:03 |ERROR|X|IO |pid=2200.4272| k_trt_req_network | Network request [G] returns -1 [] error code [0] error msg []
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| u_io_callsrv_connect_r | Error connecting to target IO server: ()
| 2013-01-17 12:17:03 |ERROR|X|DQM|pid=2828.4620| o_ext_read_name | Unable to connect to IO X of nodeID [N000000014]
##################

Patch level detected:Dollar Universe 6.0.00
Product Version: Dollar.Universe 6.0.0

Description :Using DQM with logical queue linked to several remote physical queues :
If a remote node is not available, DQM handling logical queue will endlessly try to call unavailable node, will keep jobs as pending for a long time before sending it to available physical queue, and will be unable to offer correct job monitoring (jobs can appear as aborted while they are still running on remote node).

Cause

Cause type:
Defect
Root Cause: N/A

Environment

OS: All
OS Version: any

Resolution

Make sure that all remote node having a physical queue linked to the logical queue are available.
If one is not available, unset its physical queue from logical queue settings.

Fix Status: Released

Fix Version(s):
Component: Application.Server
Version: Dollar.Universe 6.0.0

Additional Information

Workaround :
N/A