We encountered an issue with our CA RA Production environment, during initial analysis we found that there were 7 blocking processes on the DB side (details attached to the case) and in addition, there are some processes showing up on ROC dashboard under "Running" tab. We tried to kill these processes, however, they are some how stuck and cannot be killed neither deleted. These have been running since almost a month now.
Restarting the NAC did not kill the long running processes that show running for a month.
Is it recommended to kill the blocking processes on DB directly instead of restarting CA RA services which cause an outage for other ongoing deployments?
No. It is possible that some locks might indicate a problem. However, please note:
These identified blocked processes (shown by select * from sysprocesses where blocked <> 0) represent database processes. There is not necessarily any correlation between these and blocked jobs in Release Automation. In general blocking at the database level can occur for many reasons – sometimes it is normal and sometimes it could be indicative of a problem, but in general we cannot infer much from this regarding blocked jobs and deployments/jobs in Release Automation.
The fact that restarting Release Automation resulted in the database blocking going away, but the Release Automation jobs still being blocked is consistent with the above.
Killing stuck jobs in Release Automation should be carried out via the JMX – support can assist the customer with that – if there are specific jobs that cannot be killed via JMX this would need to be investigated on a case by case basis.