AutoSys WebUI Collector (CA-wcc-services) process keeps restarting and eventually ends up as not-running
search cancel

AutoSys WebUI Collector (CA-wcc-services) process keeps restarting and eventually ends up as not-running

book

Article ID: 390191

calendar_today

Updated On:

Products

Autosys Workload Automation

Issue/Introduction

After upgrade of AutoSys, the WebUI Collector (CA-wcc-services) process keeps restarting and eventually ends up as not-running.  This behavior might happen on any version of AutoSys (where no upgrade was attempted either)

Collector logs <WCC_INSTALL_LOCATION>/collectors/WCC_COLLECTOR/log/WCC_COLLECTOR.log have messages like this:

STATUS | wrapper  | 2025/02/13 09:21:10 |      235 | JVM received a signal SIGKILL (9).
STATUS | wrapper  | 2025/02/13 09:21:10 |      235 | JVM process is gone.
STATUS | wrapper  | 2025/02/13 09:21:10 |      235 | JVM process exited with a code of 1, setting the Wrapper exit code to 1.
ERROR  | wrapper  | 2025/02/13 09:21:10 |      235 | JVM exited unexpectedly.
INFO   | wrapper  | 2025/02/13 09:21:10 |      235 | JVM was running for 42 seconds (less than the successful invocation time of 300 seconds).
INFO   | wrapper  | 2025/02/13 09:21:10 |      235 |   Incrementing failed invocation count (currently 5).
FATAL  | wrapper  | 2025/02/13 09:21:10 |      235 | There were 5 failed launches in a row, each lasting less than 300 seconds.  Giving up.
FATAL  | wrapper  | 2025/02/13 09:21:10 |      235 |   There may be a configuration problem: please check the logs.
STATUS | wrapper  | 2025/02/13 09:21:10 |      235 | <-- Wrapper Stopped

 

The collector process is basically a Java program and runs under a wrapper. The wrapper is configured to monitor this JVM and it restarts if the JVM disappears somehow (up to a default of 5 times).  After that the wrapper will not start the JVM anymore and leaves it in stopped status, there by the CA-wcc-services shows up as not-running.  

Basically this results in a condition where your job/alarm/machine/reporting data collections are not working anymore.

Environment

AutoSys Workload Automation

Cause

External intrusion detection / security software agents are killing the CA-wcc-services java process. 

 

Some level of Unix auditing need to be enabled to trace out the process that is actually sending the kill signal to the CA-wcc-services wrapper.  Couple of approaches are discussed below: 

  1. Enable OS auditing  (external link) Implement the steps in the Resolution section to enable auditing and search the audit logs to reveal which process/user is killing the collector JVM


    Example (after enabling the auditing listed in the above link): 

      • using a terminal on the WebUI Server, identify the Java PID for the collector

        ps -fe|grep CollectorApplication| grep -v grep
        ## PID obtained is  130194

        ## obtain the PID of current shell (to map it to who kills the above collector)
        ps -fe|grep $$

        root      120630  120613  0 19:03 pts/1    00:00:00 -bash

        ## kill the collector PID 130194

        kill -9 130194

         
      • via another terminal to the same server, run ausearch:

    ausearch -k kill_rule
    ## parse through the entries to look for the collector Java PID 
     
    Time->Thu Mar  6 20:22:39 2025
    type=PROCTITLE msg=audit(1741292559.669:125279): proctitle="-bash"
    type=OBJ_PID msg=audit(1741292559.669:125279): opid=130194 oauid=1010 ouid=1011 oses=7 obj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 ocomm="java"
    type=SYSCALL msg=audit(1741292559.669:125279): arch=c000003e syscall=62 success=yes exit=0 a0=1fc92 a1=9 a2=0 a3=7fc8651d4f00 items=0 ppid=120613 pid=120630 auid=1010 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts1 ses=9 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="kill_rule"
    ----
     

      • This shows that   PID 120630 (which is the PID of the bash shell using which the kill was issued)  issued a kill to the Java/collector PID  130194

     

  2.  Install and enable (external link, authorization required) systemtap An extract of some of the steps are indicated below:
      • Install systemtap

        yum -y install systemtap
        stap-prep

 

      • Create the following systemtap script as sigcatch.stp to monitor signal 2, 9 and 15:

        #!/usr/bin/env stap

        probe begin
        {
            printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Start\n");
        }

        probe signal.send {
            if (sig_name == "SIGTERM" || sig_name == "SIGKILL" || sig_name == "SIGINT") {
                printf("%d %s was sent to %s(pid:%d) by %s(%d) uid:%d\n", 
                    gettimeofday_s(), sig_name, pid_name, sig_pid, execname(), pid(), uid())
            }
        }

        probe end
        {
            printf("Monitoring SIGTERM| SIGKILL| SIGINT signal: Stop\n");
        }

      • As root user, run the systemtap script and restart the CA-wcc-services (which eventually gets restarted because something is killing it)

        stap -v sigcatch.stp
        ##Monitoring SIGTERM| SIGKILL| SIGINT signal: Start
        ...
        ...
        1741287845 SIGKILL was sent to java(pid:124090) by AgentActionPrio(959900) xxxxxxxxx

 

      • The above shows that   PID 959900 (which is the PID of AgentActionPrio, a security software module)  issued a kill to the Java/collector PID  124090

 

Resolution

Work with Security administrators to tune the security software to either whitelist/tune it to recognize the Java modules launched by AutoSys are not considered as a threat and allow it to be executed normally without killing it.