CPU on one of the hosts increased from around 5% to a constant 40%.
Checking htop revealed that one of the gpss processes is consuming most of the CPU (example 1500% CPU on a host with 24 cores).
All of the jobs were stopped on the gpss process, but it continued to consue high CPU.
Restart GPSS service and re-submit the jobs. This may resolve the issue.
If the issue persists or returns, the following information should be gathered when the issue is present:
curl http://127.0.0.1:9999/debug/pprof/profile?seconds=10 > cpu_profile
curl http://127.0.0.1:9999/debug/pprof/heap
curl http://127.0.0.1:9999/debug/pprof/goroutine?debug=1
curl http://127.0.0.1:9999/debug/pprof/block
go tool pprof cpu_profile
(pprof) top
Additionally, it would useful to collect the following Linux artefacts for the GPSS process: