DX APM - UMA - Troubleshooting and Common issues
search cancel

DX APM - UMA - Troubleshooting and Common issues

book

Article ID: 212472

calendar_today

Updated On: 10-22-2023

Products

DX Application Performance Management CA Application Performance Management SaaS INTROSCOPE CA Application Performance Management Agent (APM / Wily / Introscope)

Issue/Introduction

The following is a high-list of techniques and suggestions to employ when troubleshooting UMA performance, display and configuration issues

 

 

Environment

DX APM 

Cause

 

 

Resolution

A) Official UMA Troubleshooting Guide

The official UMA Troubleshooting section is available from here

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-apm-agents/SaaS/Universal-Monitoring-Agent/universal-monitoring-agent-troubleshooting.html

IMPORTANT:

A log collection script ‘getUmaLogs.sh” is shipped along with the helm chart and operator tar bundle.
For the operator tar, the script is available in the folder “uma-operator/tools/getUmaLogs.sh.”
For the helm chart, the script is available at “helm-chart/uma/tools/getUmaLogs.sh.”
The latest script can also be downloaded using the url “https://packages.broadcom.com/artifactory/apm-agents/getUmaLogs.sh
 
Prerequisite:
Script uses ‘kubectl’ or ‘oc’ CLI tool to collect the logs. So the script should be run on the node where the CLI utility is present and also has access to the cluster.
Users running the script should have permission to access the application namespaces, UMA namespace, and to run different ‘kubectl or oc’ commands on the objects of these namespaces.
 
UMA logs collections:
Command:
./getUmaLogs.sh
 
Output:
With no parameters, the script collects all of the UMA pod logs and saves them under <current directory>/CA_APM directory, and also creates a tar file of the logs with the name ‘umaLogs.tar.’
 

B) Common issues

 

USE-CASE#1: UMA metrics not reporting due to EM /Agent metric clamps reached

Suggestion#1: Check if EM or Agent metric clamps have been reached.

a) To Check the EM clamps : Open the Metric Browser, expand the branch

Custom Metric Host (virtual) | Custom Metric Process (virtual) | Custom Metric Agent (virtual)(collector_host@port)(SuperDomain) | Enterprise manager | Connections

looks at the values for:

  - "EM Historical Metric Clamped"

  - "EM Live Metric Clamped"

The above metrics should all be 0.


To check the Agent clamp : expand the branch 

Custom Metric Host (virtual) |Custom Metric Process (virtual) | Custom Metric Agent (virtual)(collector_host@port)(SuperDomain) |Agents | Host | Process |<AgentName>

looks at the value for : "is Clamped" metric, it should be 0.

 

Suggestion#2:  Restart UMA 

There is not a restart script instead you need to delete all existing UMA pods as below:

kubectl get pods -n caapm

delete all pods using:

kubectl delete pod <podname> -n caapm

NOTE: there is no sequence as such

 

USE-CASE#2: No Openshift folder under UMA deployment agent

This could be a UMA configuration issue, for example missing or corrupted UMA cluster roles 

You can use the below steps to verify this condition and fix the problem:

Suggestion:

1) Check for error in the <clusterinfo-pod-name-pod-name>

oc logs <clusterinfo-pod-name-pod-name>

2) check the clusterInfo log from inside of the pod

oc rsh <clusterinfo-pod-name>

cd /tmp

cat clusterInfo.log

NOTE: If you cannot login to the pod, try to restart it using: oc delete po <clusterinfo-pod-name>. This is also an indication of a configuration issue.


Here is an example of the message that confirm the permission issue:

WARN io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.onFailure - Exec Failure: HTTP 401, Status: 401 - Unauthorized

Solution:

1. Download and copy attached clusterroles_uma_caapm.yml to your openshift 

2. Recreate UMA cluster roles:

oc delete -f clusterroles_uma_caapm.yaml -n capm
oc create -f clusterroles_uma_caapm.yaml -n capm

oc delete pod <clusterinfo-pod-name>
oc delete pod <container-monitor-pod-name>

3. Verify that <clusterinfo-pod-name> is not longer restarting and that the "Unauthorized" error is not longer reported in the clusterInfo.log

 

USE-CASE#3: Java AutoAttach is not working

Checklist

1) You need to review the : app-container-monitor pod logs

         2) Check there is not error in the podmonitor container:

kubectl logs <app-container-pod-name> -c podmonitor -n caapm

3) the expected message to confirm that agent is injected into the pod is:

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] Attach successful for pid 1 in container 

4) in your app-pod a /tmp/ca-deps/wily folder should have been created, here is an example how to confirm that the java agent has been attached:

kubectl exec -ti <your-app-pod> bash -ndockerapp

cd /tmp/ca-deps/wily/logs/

ls
Agent.jar                  common      core    examples    logs
AgentNoRedefNoRetrans.jar  connectors  deploy  extensions  tools

cd logs

ls -l
total 1496
-rw-r-----. 1 root root 1370757 Jul 27 16:56 AutoProbe.log
-rw-r-----. 1 root root  159419 Jul 27 16:56 IntroscopeAgent.log

 

5) Make sure the "tar" command is available from the app image

The "tar" unix command/app is required to be able to uninstall the agent package

In the app-container-monitor-<podname>.log you will see this error:

dataprovider.go:567] Err while executing command '["sh" "-c" "systick=$(getconf CLK_TCK); for c in /proc/*/cmdline; do d=$(dirname $c); name=$(grep Name: $d/status 2>/dev/null) || continue; pid=$(basename $d); uid=$(grep Uid: $d/status 2>/dev/null) || continue; uid=$(echo ${uid#Uid:} | xargs); uid=${uid%% *}; cmdline=$(cat $c|xargs -0 echo 2>/dev/null) || continue; starttime=$(($(awk '{print $22}' $d/stat 2>/dev/null || echo 0) / systick)); uptime=$(awk '{print int($1)}' /proc/uptime); elapsed=$(($uptime-$starttime)); echo $pid $uid $elapsed $cmdline; done"]' for container "#######": err: <nil>, result -> out: , err: sh: xargs: command not found
sh: xargs: command not found
sh: xargs: command not found
..

6) Check for possible Memory issues

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerAttacher] Not enough free memory available on host to attach to unbounded container . Skipping attach

 Container ... has lesser memory than configured free memory threshold of 50.0%, Skipping attach

Recommendation:

Change the below default memory threshold to 25% , by changing the value of below env, shown below
        - name: apmenv_autoattach_free_memory_threshold
          value: "25.00"

When using Operator you can't change anything on UMA side, Operator will revert back the change, in this case you can set annotation on application  pod or deployment level as below:

oc annotate pod <pod-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.free.memory.threshold=20 -n <app-ns> --overwrite
oc annotate deployment <deployment-name> d ca.broadcom.com/autoattach.java.attach.overrides=autoattach.free.memory.threshold=20 -n <app-ns> --overwrite


7) Check for a possible unsupported JVM

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerAttacher] Process 1 in container .. is an unsupported JVM. Skipping attach. JVMInfo: JVMInfo{ binaryPath='/usr/lib/jvm/java-1.8-openjdk/jre/bin/java', vendorName='IcedTea', vmName='OpenJDK 64-Bit Server VM', runtimeVersion='1.8.0_212-b04', specificationVersion='8' }

OR

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] Could not retrieve tools.jar in container [ namespace...9d3ddc58 ], please set autoattach.java.tools.repo.url property via annotation or autoattach property and restart app container. See details in the APM documentation for use.

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] If this is WebSphere Liberty container, please use annotation ca.broadcom.com/autoattach.java.attach.overrides: autoattach.java.filter.jvms=false

Recommendation:

Add the below env to the podmonitor container. (in the same section where the above memory threshold env is present). This will make UMA try to attach java agents to containers that are using unsupported JVMs.

    - name: apmenv_autoattach_java_filter_jvms
       value: "false"

When using Operator you can't change anything on UMA side, Operator will revert back the change, in this case you can set annotation on application  pod or deployment level as below:

oc annotate pod <pod-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.filter.jvms=false -n <app-ns> --overwrite
oc annotate deployment <deployment-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.filter.jvms=false -n <app-ns> --overwrite


8) Check if Java Agent cannot be injected because of permission issue

non-root user is not able to create a new directory in the pod to copy java agent 

Recommendation:

"exec" into the container and then create a folder like /opt (or anything else) and then use the below annotation so Java agent is deployed in that folder:

kubectl annotate pod <application podname> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.agent.deps.directory=/opt

If that works, then  modify their Docker app image(s) to to make room for a writeable directory so UMA can use it to inject the agent.


9) Check if the issue is related to java itself 

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerEnricher] Process 1 in container .. could not get jvm information. Skipping attach

Recommendation:

exec the pod and try to execute java, make sure it runs successfully, here is an example illustrating a java problem and the reason of the above message so the java agent could not be added to the container.

In this specific use case the solution was to remove the JAVA_TOOL_OPTIONS. You should contact your application team to fix this java issue

 

USE-CASE#4: UMA not picking up the agent name defined in the environment variable apmenv_introscope_agent_agentName

If you have already configured dynamic property resolution either on UMA side or using autoattach override annotations that takes precedence over the environment variables set.

To check this is to look at the "/tmp/ca-deps/ca-apm-java-agent.options" file and if there is a property already added for an agent name in this file, that takes precedence over the environment variable set. 

For more information refer to https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-apm-agents/SaaS/Universal-Monitoring-Agent/Install-the-Universal-Monitoring-Agent/Install-UMA-for-OpenShift/Install-and-configure-uma-using-openshift-operator.html 

 

USE-CASE#5: Missing metric from PODs / Containers

Checklist:

1)

ERROR c.c.a.b.s.OpenshiftClusterCrawlerService.watchDeploymentConfigs - error occurred in watchDeploymentConfigs, null

Exception in thread "OkHttp Dispatcher" java.lang.OutOfMemoryError: unable to create new native thread
     at java.lang.Thread.start0(Native Method)
     at java.lang.Thread.start(Thread.java:717)

Solution

Insufficient memory given for clusterinfo java process. Increase the max heap to 1024m , as shown in below line in the uma yaml file and redeploy the UMA. The below line is part of clusterinfo deployment configuration in the yaml file. Changing the memory should resolve the issue.

command: ["/usr/local/openshift/apmia/jre/bin/java", "-Xms64m","-Xmx1024m", "-Dlogging.config=file:/usr/local/openshift/logback.xml", "-jar", "/clusterinfo-1.0.jar"]


2) 

oc logs pod/container-monitor-7dcdbc5fb8-6vcvq

[ERROR] [IntroscopeAgent.GraphSender] error occurred while sending graph to EM, null
java.lang.NullPointerException
     at com.ca.apm.clusterdatareporter.K8sMetaDataGraphAttributeDecorator.getGraph(K8sMetaDataGraphAttributeDecorator.java:107)
     at java.lang.Iterable.forEach(Iterable.java:75)
     at com.ca.apm.clusterdatareporter.K8sMetaDataGraphAttributeDecorator.getGraph(K8sMetaDataGraphAttributeDecorator.java:93)

Reason:

If you are using 10.7 EM, this error can be ignored, there is not loss of functionality

 

3) Below error is reported continuous – every 2 mins:-

[ERROR] [IntroscopeAgent.GraphSender] error occurred while sending graph to EM, null
java.lang.NullPointerException

Solution:

If you are using SaaS APM, you need to set the agentManager_version to empty value (i.e. ""), you can do this by changing the following parameter "agentManager_version: "" ", in the yaml.

NOTE: If you are using APM EM 10.7, you need to set version = 10.7 as below for example. This is required to allow UMA to connect to APM EM 10.7. This property is the equivalent to "introscope.agent.connection.compatibility.version" in the Java agent.

 

USE-CASE#6: Liveness probe failed: find: /tmp/apmia-health/extensions/Docker-health.txt: No such file or directory

You noticed that there are many app-container-monitor reporting above message:

This is a known issue fixed in 21.4

Recommendation: upgrade to latest UMA 21.11 and onward versions.

 

C) How to enable DEBUG logging?

Run the below annotation command on the app deployment and then restart the application pod. This will apply the new properties during java agent attach for that specific app deployment. 
 
In openshift:
 
oc annotate deployment <deployment name> ca.broadcom.com/autoattach.java.agent.overrides="introscope.agent.log.level.root=DEBUG,introscope.agent.log.max.file.size=200MB" -n <namespace>
 
Once the log collection is done, you can remove the annotation (as below) and has to restart the pod to revert the debug properties set.
 
oc annotate -n <namespace> deployment <deployment name> ca.broadcom.com/autoattach.java.agent.overrides- 

 

D) How to change Agent properties?

Changing the java agent properties attached to the application can be done through the annotation method as documented in above section, below another example where we change the values for multiple agent properties:

In openshift:

oc annotate deployment <deployment name> ca.broadcom.com/autoattach.java.agent.overrides="introscope.autoprobe.logclassdetails.enabled=true,introscope.autoprobe.enable.tracergroup.ClassLocationTracing=true,introscope.agent.log.level.root=DEBUG" -n <namespace> 

 

E) What diagnostic files should I gather for Broadcom Support?

Option 1 (Recommended): use https://packages.broadcom.com/artifactory/apm-agents/getUmaLogs.sh to collect the full set of logs

Option 2: Collect the logs from the below pods:

-app-container-monitor-* (there should be 1 pod for each node)
-cluster-performance-prometheus-*
-clusterinfo-*
-container-monitor-*

Here is an example of the commands (if you are using openshift you can use "oc" command):

kubectl logs <app-container-monitor-pod-name> --all-containers -ncaapm
kubectl logs <app-container-pod-name> -c podmonitor -n caapm
kubectl logs <cluster-performance-prometheus> --all-containers -ncaapm
kubectl logs <clusterinfo-pod-name> --all-containers -ncaapm
kubectl logs <container-monitor-pod-name> --all-containers -ncaapm

NOTE: If the issue is related to java agent not getting injected as expected, then the most important log to collect is the app-container-<pod-name,  you should have 1 app-container-pod-name on each node, make sure to collect the log from the right node where the issue is happening (from where your java application is running)

Additional Information

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-apm-agents/SaaS/Universal-Monitoring-Agent/universal-monitoring-agent-troubleshooting.html 

Attachments

1641922250766__clusterroles_uma_caapm.yaml get_app