In APM UI, we can see that Universal Monitoring Agent (UMA agent) is connected, but the metrics view does not show any metrics for the Kubernetes agent. UMA agent is installed in the cluster and is throwing the following exceptions.
[ERROR] [IntroscopeAgent.DefaultMetricCollectionServiceImpl] error occured while getting the api-response (onFailure) from the api-endpoint, http://10.xxx.xxx.49:31314/cluster/namespaces/nodes/stat
java.io.IOException: Canceled
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:260)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[ERROR] [IntroscopeAgent.KubernetesMonitor.PlatformMonitor] Failed to connect http://10.xxx.xxx.49:31314/pods?ns=<namespace>
Release : 20.2
Component : APM Agents
This error means "clusterinfo" service is for some reason not accessible.
There can be three possible reasons:
1. UMA clusterinfo pod/service has been brought down by the cluster admin (either for upgrade or for any other maintenance)
2. UMA clusterinfo service port has changed to some other port.
3. UMA clusterinfo pod has some issues and it is not able to serve any request.
We suggest following steps to troubleshoot this issue.
1. We would recommend checking with the cluster admin on the state of the clusterinfo pod/service and check if the service is listening on 31314 node port.
2. If nothing has changed with UMA agent installation, then we would recommend going inside the clusterinfo pod (kubectl exec -it ...), and executing "wget localhost:8080/up" , if this url returns no error, then there is some connectivity issue in the cluster.
3. If the above test fails i.e. "wget localhost:8080/up" hangs , then restart the cluster info pod and add liveness probe to clusterinfo. Some versions of UMA agent are still running without liveness probe for clusterinfo.