Whenever you see any issue where it looks like jobs are getting stuck, the first place to look at is the simple_health_monitor.log and search for any BLOCKED thread (simple keyword search). If you notice there are BLOCKED threads, the next step should be to take the THREADDUMP immediately.
Take the THREADDUMP without fail before restarting the Agent, if you notice BLOCKED threads in SIMPLE_HAELTH_MONITOR.LOG.
Taking a THREADDUMP will help rectifying the problem early. Otherwise, what happens is that Agent gets recycled and problem gets solved temporarily but we keep waiting for that situation to happen again to take the thread dump appropriately and figure out the root cause. This wait time can vary a lot.
THREAD DUMP is not required in general error condition. It is only required when threads are getting BLOCKED/DEADLOCKED.
How do you find if threads are BLOCKED/DEADLOCKED: As suggested earlier, look at simple_health_monitor.log and check for BLOCKED threads via simple search in the file. If there are BLOCKED threads, this indicates that threads are getting blocked and THREADDUMP would be required.
Getting thread dumps from Java is standard practice it is documented in the Java documentation.
It does not require any special tools on Linux and UNIX platform all you need is the cybAgent process id.
Just issue a kill -3 PID_of_the_cybAgent_process, this will not terminate the process it will just create the dump to stdout or stderr and the process will still continue to run.
The agents stdout and stderr files are located in the agent directory and are called nohup.stdout and nohup.stderr