My Azure pipeline is not showing the needed metric. We are seeing other metrics but the most important one is not showing. I need help as soon as possible
1) RCA
Here is the RCA Analysis. I have completed all my action items. We can meet if needed. But awaiting next steps from you including follow-up questions or closure
Not following the recommended heap settings caused this issue. (See previous email).
-Your APMIA wrapper logs showing running out of memory
INFO | jvm 12 | java.lang.OutOfMemoryError: GC overhead limit exceeded
STATUS | wrapper | The JVM has run out of memory. Restarting JVM.
-Then it created the heap dump !
INFO | jvm 12 | Dumping heap to logs/java_pid69652.hprof ...
INFO | jvm 12 | Heap dump file created [518425929 bytes in 2.437 secs]
-It looks like you changed the heap settings before this run
12/21/22 09:56:32 AM EST
-Notice you are getting this in passing
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
This appears to be using a 99 agent
12/21/22 01:23:59 AM EST [INFO] [InfrastructureAgent] CA APM Infrastructure Agent Release 99.99.0.nextgen_aws_restmon (Build 1272)
A 99 agent is usually a development build.
-Until you increased the heap, the Agent logs were filled with
ERROR] [IntroscopeAgent.RESTMon] [[email protected]@azure.3~~azure_datafactory ApmNodeProcessor] Error in getExternalIds for :%layer::AzurePipelineRun%//%:%//%$['value'][*]['pipelineRunId']%//%:%//%%pipelinename:%datafactoryname:%resgrp:%azure_subscriptionid
[ERROR] [IntroscopeAgent.RESTMon] [[email protected]@azure.3~~azure_datafactory ApmNodeProcessor] Error in getExternalIds for :%layer::AzureActivityRun%//%:%//%$['value'][*]['activity Name']%//%:%//%AzurePipelineRun%//%:%//%$['value'][*]['pipelineRunId']%//%:%//%%pipelinename:%datafactoryname:%resgrp:%azure_subscriptionid
So it clearly could not process the Azure metrics.
-The Autoprobe log only showed
==============
> #######################
> # Agent Specification
> # ===================
> # This is the main agent implementation class which handles
> # the logging and tracing of instrumentation data.
>
> InstrumentTraceClass: com.wily.introscope.agent.AgentShim
>
Finished file dump for /com/wily/introscope/probebuilder/pbd/required.pbd
Which may be due to the 99 agent. (Autoprobe heading)
Introscope AutoProbe Log: /opt/CA/apmia/logs/AutoProbe.log
Release 99.99.0.nextgen_aws_restmon (Build 1272)(Diagnos Version 2)
Log opened at Wed Dec 21 09:56:33 EST 2022
Copyright © 2021 Broadcom. All Rights Reserved.
I looked at the hprof . Top classes/memory
The class "com.ca.ce.restmon.util.ParsedContext", loaded by "com.wily.util.extension.JarExtension$AllPermissionsClassLoaderChildFirst @ 0xe01d1588", occupies 136,460,440 (32.95%) bytes. The memory is accumulated in one instance of "java.util.HashMap$Node[]" loaded by "<system class loader>".
The classloader/component "com.wily.util.extension.JarExtension$AllPermissionsClassLoaderChildFirst @ 0xe008b440" occupies 104,042,192 (25.02%) bytes. The memory is accumulated in one instance of "java.lang.Object[]" loaded by "<system class loader>".
8 instances of "io.netty.buffer.PoolChunk", loaded by "com.wily.util.extension.JarExtension$AllPermissionsClassLoaderChildFirst @ 0xe008b440" occupy 67,244,064 (16.17%) bytes.
So holding a lot of objects in memory when ran out of memory.
2) Sizing recommendations.
Did you see this in the documentation?
The APM Infrastructure Agent has the following Memory Requirements:The default minimum Java Heap Size is 256MB and the default maximum Java Heap Size is 512MB.
For Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), the recommended minimum and maximum heap size is 4GB.
You kept the minimum Java Heap Size at 246MB. I would change the value to 4 GB or 6 GB. If you need further guidance from Engineering, we can look into that. But the maximum heap at 6 GB is working well. So changing the minimum should resolve that issue. Please set and let me know if we can mark that aspect as closed.