When checking a job launched from a logical queue to a remote physical queue, jobs turn to aborted on logical queue's node while still running on remote node.
Description :When checking a job launched from a logical queue to a remote physical queue, sometimes jobs can turn to aborted on logical queue's node while still running on remote node. Meanwhile, Job is seen as "aborted" on mother node even if the process is still running on remote node.
Cause
Cause type: Defect Root Cause: Any network issue when mother node with the logical queue was checking the status of a job on a remote DQM physical queue led to the job taking status "aborted".
Environment
OS: All OS Version: any
Resolution
A retry has been added, with number of other improvements in DQM logical-physical queue management. (reducing to number of needed check, etc.)