When a node is restarted, DQM can loose the reference to the PIDs of some of the running jobs.
Impacted jobs end then as aborted in Dollar Universe, whereas they are still running on the system.
These messages can be found in the universe.log for each occurrence (no trace level activated!!). The timestamp reflects the end time of the job in the Console 'Job Run' panel.
=====================================================================
| 2020-10-08 19:49:09 |ERROR|X|DQM|pid=21135.2474| owls_dqm_job_end | u_dqm_end_job returns 3
| 2020-10-09 00:34:24 |ERROR|X|DQM|pid=21135.6323| owls_dqm_job_end | u_dqm_end_job returns 3
| 2020-10-09 00:37:32 |ERROR|X|DQM|pid=21135.6336| owls_dqm_job_end | u_dqm_end_job returns 3
=====================================================================
Dollar Universe 6.9
After a Dollar Universe node restart, a running job launching loop could end aborted without being actually ended.
This was due to a system error when checking a child process during a fork procedure. To avoid this issue, the job status check is being performed again the next cycle of DQM check.
This is a bug corrected in version 6.10.11. Upgrade to this version or to a higher version to have the problem fixed.