After upgrade of AutoSys, the WebUI Collector (CA-wcc-services) process keeps restarting and eventually ends up as not-running. This behavior might happen on any version of AutoSys (where no upgrade was attempted either)
Collector logs <WCC_INSTALL_LOCATION>/collectors/WCC_COLLECTOR/log/WCC_COLLECTOR.log have messages like this:
STATUS | wrapper | 2025/02/13 09:21:10 | 235 | JVM received a signal SIGKILL (9).STATUS | wrapper | 2025/02/13 09:21:10 | 235 | JVM process is gone.STATUS | wrapper | 2025/02/13 09:21:10 | 235 | JVM process exited with a code of 1, setting the Wrapper exit code to 1.ERROR | wrapper | 2025/02/13 09:21:10 | 235 | JVM exited unexpectedly.INFO | wrapper | 2025/02/13 09:21:10 | 235 | JVM was running for 42 seconds (less than the successful invocation time of 300 seconds).INFO | wrapper | 2025/02/13 09:21:10 | 235 | Incrementing failed invocation count (currently 5).FATAL | wrapper | 2025/02/13 09:21:10 | 235 | There were 5 failed launches in a row, each lasting less than 300 seconds. Giving up.FATAL | wrapper | 2025/02/13 09:21:10 | 235 | There may be a configuration problem: please check the logs.STATUS | wrapper | 2025/02/13 09:21:10 | 235 | <-- Wrapper Stopped
The collector process is basically a Java program and runs under a wrapper. The wrapper is configured to monitor this JVM and it restarts if the JVM disappears somehow (up to a default of 5 times). After that the wrapper will not start the JVM anymore and leaves it in stopped status, there by the CA-wcc-services shows up as not-running.
Basically this results in a condition where your job/alarm/machine/reporting data collections are not working anymore.
AutoSys Workload Automation
External intrusion detection / security software agents are killing the CA-wcc-services java process.
Some level of Unix auditing need to be enabled to trace out the process that is actually sending the kill signal to the CA-wcc-services wrapper. Couple of approaches are discussed below:
Example (after enabling the auditing listed in the above link):
ps -fe|grep CollectorApplication| grep -v grep
## PID obtained is 130194
## obtain the PID of current shell (to map it to who kills the above collector)
ps -fe|grep $$root 120630 120613 0 19:03 pts/1 00:00:00 -bash
## kill the collector PID 130194kill -9 130194ausearch -k kill_rule## parse through the entries to look for the collector Java PID Time->Thu Mar 6 20:22:39 2025type=PROCTITLE msg=audit(1741292559.669:125279): proctitle="-bash"type=OBJ_PID msg=audit(1741292559.669:125279): opid=130194 oauid=1010 ouid=1011 oses=7 obj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 ocomm="java"type=SYSCALL msg=audit(1741292559.669:125279): arch=c000003e syscall=62 success=yes exit=0 a0=1fc92 a1=9 a2=0 a3=7fc8651d4f00 items=0 ppid=120613 pid=120630 auid=1010 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=9 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="kill_rule"----
systemtap An extract of some of the steps are indicated below:yum -y install systemtapstap-prep
#!/usr/bin/env stap
probe begin{ printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Start\n");}
probe signal.send { if (sig_name == "SIGTERM" || sig_name == "SIGKILL" || sig_name == "SIGINT") { printf("%d %s was sent to %s(pid:%d) by %s(%d) uid:%d\n", gettimeofday_s(), sig_name, pid_name, sig_pid, execname(), pid(), uid()) }}
probe end{ printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Stop\n");}
stap -v sigcatch.stp
##Monitoring SIGTERM| SIGKILL| SIGINT signal: Start
...
...
1741287845 SIGKILL was sent to java(pid:124090) by AgentActionPrio(959900) xxxxxxxxx
Work with Security administrators to tune the security software to either whitelist/tune it to recognize the Java modules launched by AutoSys are not considered as a threat and allow it to be executed normally without killing it.