search cancel

WCC high cpu utilization on linux machine

book

Article ID: 258185

calendar_today

Updated On:

Products

CA Workload Automation AE

Issue/Introduction

Hello CA,

We're trying to understand what could be the cause of this java spike that happened today and we tried to restart wcc/igateway and dxserver services but they still on high CPU usage.
I have checked the logs on wcc/logs but it didn't give me a clue.

 

top - 14:49:33 up 68 days, 11:42,  1 user,  load average: 1.54, 1.67, 1.95
Tasks: 121 total,   2 running, 119 sleeping,   0 stopped,   0 zombie
%Cpu(s): 47.6 us,  6.7 sy,  0.3 ni, 45.2 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem :  7904516 total,  1407160 free,  3149748 used,  3347608 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  4030820 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                         
 2381 wccuser   20   0 5053580   1.7g  28052 S 103.7 22.0 599:53.11 java                                                            
 2197 wccuser   20   0 3034768 542864  31436 S   3.7  6.9  33:39.23 java                                                            
22551 root      24   4 1200932  51600   5848 S   1.3  0.7   7:04.77 aws                                                             
 1909 root      20   0  294184  39640   4076 S   0.3  0.5  15:10.97 ruby                                                            
20483 root      20   0  166524   2476   1760 R   0.3  0.0   0:06.95 top                                                             
    1 root      20   0  128312   6676   3932 S   0.0  0.1  71:54.94 systemd                                                         
    2 root      20   0       0      0      0 S   0.0  0.0   0:04.55 kthreadd                                                        
    4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                    
    6 root      20   0       0      0      0 S   0.0  0.0   4:19.86 ksoftirqd/0                                                     
    7 root      rt   0       0      0      0 S   0.0  0.0   0:20.74 migration/0     

 

Total CPU was hitting 100% earlier today before the restart. Any thoughts?

Environment

WCC 12.X
EEM 12.x
Autosys 12.X

Cause

Engineering went through the dumps and identified that EEM SDK calls are consuming high CPU cycles.

Example:

3XMTHREADINFO      "m_pethread" J9VMThread:0x0000000005DFF800, omrthread_t:0x00007F230C0CE6B8, java/lang/Thread:0x00000000412C3448, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x75, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0x200, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2            (native stack address range from:0x00007F22DFA2D000, to:0x00007F22DFA6D000, size:0x40000)
3XMCPUTIME               CPU usage total: 481703.419476232 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=361823984 (0x1590FEF0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/io/UnixFileSystem.list(Native Method)
4XESTACKTRACE                at java/io/File.list(File.java:1122(Compiled Code))
4XESTACKTRACE                at java/io/File.listFiles(File.java:1207(Compiled Code))
4XESTACKTRACE                at com/ca/eiam/SafeSAF.getFirstSafFileName(SafeSAF.java:410(Compiled Code))
4XESTACKTRACE                at com/ca/eiam/SafeCache.processOldSafFiles(SafeCache.java:2723(Compiled Code))
4XESTACKTRACE                at com/ca/eiam/SafeCache.access$3200(SafeCache.java:65)
4XESTACKTRACE                at com/ca/eiam/SafeCache$OutstandingEvtLoop.run(SafeCache.java:2365)
3XMTHREADINFO3           Native callstack:

 

We also noticed that there are a whole bunch of .saf  files getting created in your log folder.   (Storage And Forward) files from EEM SDK layers.

Resolution

1) stop WCC

2) I believe it is the /opt/CA/WorkloadAutomationAE/wcc/log/eem/audit   folder that contains a ton of  *.saf files on your WCC box
(you can search for them recursively in  /opt/CA/WorkloadAutomationAE/wcc/log)

3) If so, you can move this folder to a different location  /opt/CA/WorkloadAutomationAE/wcc/audit_backup   

4) make sure there are no more tons of   *.saf   files under /opt/CA/WorkloadAutomationAE/wcc/log folder recursively.

5) Restart WCC