UMA deployment agent constantly restarting

book

Article ID: 224022

calendar_today

Updated On:

Products

CA Application Performance Management (APM / Wily / Introscope)

Issue/Introduction

The deployment agent is constantly restarting.

Unclear how to debug why it is happening and whether the parameters used need 'tweaking'.

The container logfile doesn't show anything obvious and gets overwritten each time the container restarts.

Details below:-

sh-4.2$ uname -a
Linux container-monitor-d5dcbbc66-dw2n7 3.10.0-1160.31.1.el7.x86_64 #1 SMP Wed May 26 20:18:08 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Restarted 99 times since the POD was created:-

container-monitor-d5dcbbc66-dw2n7   1/1       Running     99         9d        22.249.87.61    

These are the parameters used: (excerpt)

Restart Count:  99
    Limits:
      cpu:     2
      memory:  1G
    Requests:
      cpu:     200m
      memory:  300Mi
    Liveness:  http-get http://:8888/healthz delay=60s timeout=1s period=60s #success=1 #failure=3
    Environment:

      MIN_HEAP_VAL_IN_MB:                                                            512
      MAX_HEAP_VAL_IN_MB:                                                            1024

sh-4.2$ ps -ef | grep -i wily
uma         58    55 99 08:16 ?        02:10:18 /usr/local/openshift/apmia/jre/bin/java -server -classpath /usr/local/openshift/apmia/lib/* -Xms512m -Xmx1024m -XX:ErrorFile=logs/jvm_error.%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=logs/ com.wily.introscope.agent.uma.UnifiedMonitoringAgent
uma        917   899  0 10:22 ?        00:00:00 grep -i wily

 

Cause

Suggested diagnostic steps, all of this information can be shared with support where raising a case

Run oc get all to get background to the deployment

Focus on pods that have restarted a lot, note the exact container name for use in subsequent commands

pod/clusterinfo-6f756ccd5c-d7tww        1/1       Running     2          24d
pod/container-monitor-d5dcbbc66-dw2n7   1/1       Running     101        10d

pod/app-container-monitor-9bjrj         2/2       Running     2          3d

Output of following commands :
 
kubectl describe pod container-monitor-d5dcbbc66-dw2n7 -n caapm
kubectl describe pod app-container-monitor-9bjrj -n caapm
kubectl describe pod clusterinfo-6f756ccd5c-d7tww -n caapm
 
Logs of the above pods, by executing below commands :
 
kubectl logs container-monitor-d5dcbbc66-dw2n7 -n caapm
kubectl logs app-container-monitor-9bjrj  -c podmonitor -n caapm
kubectl logs app-container-monitor-9bjrj  -c containerinfo -n caapm
 
kubectl exec -it clusterinfo-6f756ccd5c-d7tww -n caapm -- bash
 -> Inside clusterinfo pod/container , get clusterinfo.log from the logs directory
 
For this particular scenario, checking the describe output for the container-monitor showed that the container was being restarted due to out of memory situation
 
Containers:
  uma:

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      
    Ready:          True
    Restart Count:  228
 
 
 
 

Environment

Release : 20.2

Component :

Resolution

 

Heap was raised for both the container and for the agent process inside it

Limits:
      cpu:     2
      memory:  2G

 

      MIN_HEAP_VAL_IN_MB:                                                            1024
      MAX_HEAP_VAL_IN_MB:                                                            2048

This greatly reduced the amount of restarts, more heap could be allocated if available or desired