DX APM - UMA - Troubleshooting and Common issues
search cancel

DX APM - UMA - Troubleshooting and Common issues

book

Article ID: 212472

calendar_today

Updated On:

Products

DX Application Performance Management CA Application Performance Management SaaS INTROSCOPE CA Application Performance Management Agent (APM / Wily / Introscope)

Issue/Introduction

The following is a high-list of techniques and suggestions to employ when troubleshooting UMA performance, display and configuration issues

 

 

Environment

DX APM Agent 20.2, SaaS

Cause

 

 

Resolution

A) Common issues

USE-CASE#1: UMA metrics not reporting due to EM /Agent metric clamps reached

Suggestion#1: Check if EM or Agent metric clamps have been reached.

a) To Check the EM clamps : Open the Metric Browser, expand the branch

Custom Metric Host (virtual) | Custom Metric Process (virtual) | Custom Metric Agent (virtual)([email protected])(SuperDomain) | Enterprise manager | Connections

looks at the values for:

  - "EM Historical Metric Clamped"

  - "EM Live Metric Clamped"

The above metrics should all be 0.


To check the Agent clamp : expand the branch 

Custom Metric Host (virtual) |Custom Metric Process (virtual) | Custom Metric Agent (virtual)([email protected])(SuperDomain) |Agents | Host | Process |<AgentName>

looks at the value for : "is Clamped" metric, it should be 0.

 

Suggestion#2:  Restart UMA 

There is not a restart script instead you need to delete all existing UMA pods as below:

kubectl get pods -n caapm

delete all pods using:

kubectl delete pod <podname> -n caapm

NOTE: there is no sequence as such

 

USE-CASE#2: No Openshift folder under UMA deployment agent

This could be a UMA configuration issue, for example missing or corrupted UMA cluster roles 

You can use the below steps to verify this condition and fix the problem:

Suggestion:

1) Check for error in the <clusterinfo-pod-name-pod-name>

oc logs <clusterinfo-pod-name-pod-name>

2) check the clusterInfo log from inside of the pod

oc rsh <clusterinfo-pod-name>

cd /tmp

cat clusterInfo.log

NOTE: If you cannot login to the pod, try to restart it using: oc delete po <clusterinfo-pod-name>. This is also an indication of a configuration issue.


Here is an example of the message that confirm the permission issue:

WARN io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.onFailure - Exec Failure: HTTP 401, Status: 401 - Unauthorized

Solution:

1. Download and copy attached clusterroles_uma_caapm.yml to your openshift 

2. Recreate UMA cluster roles:

oc delete -f clusterroles_uma_caapm.yaml -n capm
oc create -f clusterroles_uma_caapm.yaml -n capm

oc delete pod <clusterinfo-pod-name>
oc delete pod <container-monitor-pod-name>

3. Verify that <clusterinfo-pod-name> is not longer restarting and that the "Unauthorized" error is not longer reported in the clusterInfo.log

 

USE-CASE#3: Instrumentation of Java apps/pods not working

You need to review the : app-container-monitor pod logs:

The expected message to confirm that agent is injected into the pod is:

7/27/22 04:53:09 PM GMT [INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] Attach successful for pid 1 in container [ namespace/dockerapp pod wlp-845c4ffcd-8tq4z container/wlp id/0ca17ae4ea2b437687f9fcc880dc6203a93054a4afbb2d1ff9c735e7348a0914 ]

In this example, you can connect to the pod and confirm that wily java agent has been added under the /tmp/ca-deps/wily/ directory, below an example:

kubectl exec -ti wlp-845c4ffcd-8tq4z bash -ndockerapp

cd /tmp/ca-deps/wily/logs/

ls
Agent.jar                  common      core    examples    logs
AgentNoRedefNoRetrans.jar  connectors  deploy  extensions  tools

cd logs

ls -l
total 1496
-rw-r-----. 1 root root 1370757 Jul 27 16:56 AutoProbe.log
-rw-r-----. 1 root root  159419 Jul 27 16:56 IntroscopeAgent.log

 

Checklist:

1) Check for possible Memory issues

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerAttacher] Not enough free memory available on host to attach to unbounded container [ namespace/digital-factory-uat pod/logstash-sync-db-to-elk-5cc7445d7d-zxrd6 container/logstash-sync-db-to-elk id/56f7bfa2617f2e0f26aa72c8906ee95de9db813c9571ba272b689ae0d87ea310 ]. Skipping attach

identified as Java process in container [ namespace/digital-factory-uat pod/elasticsearch-master-2 container/elasticsearch id/a3004d2bdddd9e246dc9109614f4441d7ddff98716e73e0470afb9cfc2979a49 ]
3/23/21 09:00:19 AM GMT [INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerAttacher] Container a3004d2bdddd9e246dc9109614f4441d7ddff98716e73e0470afb9cfc2979a49 has lesser memory than configured free memory threshold of 50.0%, Skipping attach

Recommendation:

Change the below default memory threshold to 25% , by changing the value of below env, shown below
        - name: apmenv_autoattach_free_memory_threshold
          value: "25.00"

When using Operator you can't change anything on UMA side, Operator will revert back the change, in this case you can set annotation on application  pod or deployment level as below:

oc annotate pod <pod-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.free.memory.threshold=20 -n <app-ns> --overwrite
oc annotate deployment <deployment-name> d ca.broadcom.com/autoattach.java.attach.overrides=autoattach.free.memory.threshold=20 -n <app-ns> --overwrite


2) Check for a possible unsupported JVM

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerAttacher] Process 1 in container [ namespace/digital-factory-uat pod/payment-jobs-7dcd9fd4fc-klm9r container/payment id/636dbbc6da32ff484798b67760810b05027907e190a4d72cc72876d08756472e ] is an unsupported JVM. Skipping attach. JVMInfo: JVMInfo{ binaryPath='/usr/lib/jvm/java-1.8-openjdk/jre/bin/java', vendorName='IcedTea', vmName='OpenJDK 64-Bit Server VM', runtimeVersion='1.8.0_212-b04', specificationVersion='8' }

OR

[INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] Could not retrieve tools.jar in container [ namespace...9d3ddc58 ], please set autoattach.java.tools.repo.url property via annotation or autoattach property and restart app container. See details in the APM documentation for use.

1/19/23 10:24:35 AM GMT [INFO] [IntroscopeAgent.AutoAttach.Java.UnixContainerAttacher] If this is WebSphere Liberty container, please use annotation ca.broadcom.com/autoattach.java.attach.overrides: autoattach.java.filter.jvms=false

Recommendation:

Add the below env to the podmonitor container. (in the same section where the above memory threshold env is present). This will make UMA try to attach java agents to containers that are using unsupported JVMs.

    - name: apmenv_autoattach_java_filter_jvms
       value: "false"

When using Operator you can't change anything on UMA side, Operator will revert back the change, in this case you can set annotation on application  pod or deployment level as below:

oc annotate pod <pod-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.filter.jvms=false -n <app-ns> --overwrite
oc annotate deployment <deployment-name> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.filter.jvms=false -n <app-ns> --overwrite


3) Check if Java Agent cannot be injected because of permission issue

non-root user is not able to create a new directory in the pod to copy java agent 

Recommendation:

"exec" into the container and then create a folder like /opt (or anything else) and then use the below annotation so Java agent is deployed in that folder:

kubectl annotate pod <application podname> ca.broadcom.com/autoattach.java.attach.overrides=autoattach.java.agent.deps.directory=/opt

If that works, then  modify their Docker app image(s) to to make room for a writeable directory so UMA can use it to inject the agent.


4) Check if the issue is related to java itself 

9/07/21 06:33:09 AM GMT [INFO] [IntroscopeAgent.AutoAttach.Java.UnixDockerEnricher] Process 1 in container [ namespace/tams-test pod/tams--437- id/8cc8bc8221f6e0aa15873ea8e582158c083a4a1781e7295c3ede34ea7d2e6f7f ] could not get jvm information. Skipping attach

Recommendation:

exec the pod and try to execute java, make sure it runs successfully, here is an example illustrating a java problem and the reason of the above message so the java agent could not be added to the container.

In this specific use case the solution was to remove the JAVA_TOOL_OPTIONS. You should contact your application team to fix this java issue

 

USE-CASE#4: Missing metric from PODs / Containers

Checklist:

1)

20-05-2021 10:37:56 [pool-5-thread-8] ERROR c.c.a.b.s.OpenshiftClusterCrawlerService.watchDeploymentConfigs - error occurred in watchDeploymentConfigs, null

Exception in thread "OkHttp Dispatcher" java.lang.OutOfMemoryError: unable to create new native thread
     at java.lang.Thread.start0(Native Method)
     at java.lang.Thread.start(Thread.java:717)

Solution

Insufficient memory given for clusterinfo java process. Increase the max heap to 1024m , as shown in below line in the uma yaml file and redeploy the UMA. The below line is part of clusterinfo deployment configuration in the yaml file. Changing the memory should resolve the issue.

command: ["/usr/local/openshift/apmia/jre/bin/java", "-Xms64m","-Xmx1024m", "-Dlogging.config=file:/usr/local/openshift/logback.xml", "-jar", "/clusterinfo-1.0.jar"]


2) 

oc logs pod/container-monitor-7dcdbc5fb8-6vcvq

5/21/21 12:48:20 PM UTC [ERROR] [IntroscopeAgent.GraphSender] error occurred while sending graph to EM, null
java.lang.NullPointerException
     at com.ca.apm.clusterdatareporter.K8sMetaDataGraphAttributeDecorator.getGraph(K8sMetaDataGraphAttributeDecorator.java:107)
     at java.lang.Iterable.forEach(Iterable.java:75)
     at com.ca.apm.clusterdatareporter.K8sMetaDataGraphAttributeDecorator.getGraph(K8sMetaDataGraphAttributeDecorator.java:93)
     at com.ca.apm.clusterdatareporter.K8sMetadataGraphHelper$GraphSender.sendGraph(K8sMetadataGraphHelper.java:601)
     at com.ca.apm.clusterdatareporter.K8sMetadataGraphHelper$GraphSender.sendGraphInBatches(K8sMetadataGraphHelper.java:580)
     at com.ca.apm.clusterdatareporter.K8sMetadataGraphHelper$GraphSender.lambda$1(K8sMetadataGraphHelper.java:548)
     at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)

Reason:

If you are using 10.7 EM, this error can be ignored, there is not loss of functionality

 

3) Below error is reported continuous – every 2 mins:-

5/25/21 08:48:20 AM UTC [ERROR] [IntroscopeAgent.GraphSender] error occurred while sending graph to EM, null
java.lang.NullPointerException

Solution:

If you are using SaaS APM, you need to set the agentManager_version to empty value (i.e. ""), you can do this by changing the following parameter "agentManager_version: "" ", in the yaml.

NOTE: If you are using APM EM 10.7, you need to set version = 10.7 as below for example. This is required to allow UMA to connect to APM EM 10.7. This property is the equivalent to "introscope.agent.connection.compatibility.version" in the Java agent.

 

USE-CASE#5: Liveness probe failed: find: /tmp/apmia-health/extensions/Docker-health.txt: No such file or directory

You noticed that there are many app-container-monitor reporting above message:

This is a known issue fixed in 21.4

Recommendation: upgrade to latest UMA 21.11 and onward versions.

 

B) What diagnostic files should I gather for Broadcom Support?

Collect logs from the following pods:

-app-container-monitor-* (there should be 1 pod for each node)
-cluster-performance-prometheus-*
-clusterinfo-*
-container-monitor-*

Here is an example of the commands (if you are using openshift you can use "oc" command):

kubectl logs <app-container-pod-name> --all-containers -ncaapm
kubectl logs <cluster-performance-prometheus> --all-containers -ncaapm
kubectl logs <clusterinfo-pod-name> --all-containers -ncaapm
kubectl logs <container-monitor-pod-name> --all-containers -ncaapm

 

First check for podmonitor container in the daemonset pods there shouldn't be any connection errors

kubectl logs <app-container-pod-name> -c podmonitor -n caapm

 

NOTE: If the issue is related to java agent not getting injected as expected, then the most important log to collect is the app-container-<pod-name,  you should have 1 app-container-pod-name on each node, make sure to collect the log from the right node where the issue is happening (from where your java application is running)

Additional Information

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-apm-saas/SaaS/implementing-agents/Universal-Monitoring-Agent/Install-the-Universal-Monitoring-Agent/Install-and-Configure-UMA-for-Kubernetes,-Google-Kubernetes-Engine,-and-Azure-Kubernetes-Service-Monitoring.html

Attachments

1641922250766__clusterroles_uma_caapm.yaml get_app