UMA deployment agent constantly restarting


Article ID: 224022


Updated On:


CA Application Performance Management (APM / Wily / Introscope)


The deployment agent is constantly restarting.

Unclear how to debug why it is happening and whether the parameters used need 'tweaking'.

The container logfile doesn't show anything obvious and gets overwritten each time the container restarts.

Details below:-

sh-4.2$ uname -a
Linux container-monitor-d5dcbbc66-dw2n7 3.10.0-1160.31.1.el7.x86_64 #1 SMP Wed May 26 20:18:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Restarted 99 times since the POD was created:-

container-monitor-d5dcbbc66-dw2n7   1/1       Running     99         9d    

These are the parameters used: (excerpt)

Restart Count:  99
      cpu:     2
      memory:  1G
      cpu:     200m
      memory:  300Mi
    Liveness:  http-get http://:8888/healthz delay=60s timeout=1s period=60s #success=1 #failure=3

      MIN_HEAP_VAL_IN_MB:                                                            512
      MAX_HEAP_VAL_IN_MB:                                                            1024

sh-4.2$ ps -ef | grep -i wily
uma         58    55 99 08:16 ?        02:10:18 /usr/local/openshift/apmia/jre/bin/java -server -classpath /usr/local/openshift/apmia/lib/* -Xms512m -Xmx1024m -XX:ErrorFile=logs/jvm_error.%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=logs/ com.wily.introscope.agent.uma.UnifiedMonitoringAgent
uma        917   899  0 10:22 ?        00:00:00 grep -i wily



Suggested diagnostic steps, all of this information can be shared with support where raising a case

Run oc get all to get background to the deployment

Focus on pods that have restarted a lot, note the exact container name for use in subsequent commands

pod/clusterinfo-6f756ccd5c-d7tww        1/1       Running     2          24d
pod/container-monitor-d5dcbbc66-dw2n7   1/1       Running     101        10d

pod/app-container-monitor-9bjrj         2/2       Running     2          3d

Output of following commands :
kubectl describe pod container-monitor-d5dcbbc66-dw2n7 -n caapm
kubectl describe pod app-container-monitor-9bjrj -n caapm
kubectl describe pod clusterinfo-6f756ccd5c-d7tww -n caapm
Logs of the above pods, by executing below commands :
kubectl logs container-monitor-d5dcbbc66-dw2n7 -n caapm
kubectl logs app-container-monitor-9bjrj  -c podmonitor -n caapm
kubectl logs app-container-monitor-9bjrj  -c containerinfo -n caapm
kubectl exec -it clusterinfo-6f756ccd5c-d7tww -n caapm -- bash
 -> Inside clusterinfo pod/container , get clusterinfo.log from the logs directory
For this particular scenario, checking the describe output for the container-monitor showed that the container was being restarted due to out of memory situation

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
    Ready:          True
    Restart Count:  228


Release : 20.2

Component :



Heap was raised for both the container and for the agent process inside it

      cpu:     2
      memory:  2G


      MIN_HEAP_VAL_IN_MB:                                                            1024
      MAX_HEAP_VAL_IN_MB:                                                            2048

This greatly reduced the amount of restarts, more heap could be allocated if available or desired