Mobile Banking application in Production experienced memory problems after an agent upgrade. There were no changes in the application itself. A memory leak is present that is visible in the usage of the PS Old Gen memory pool. As soon as that memory region is close to full, the GC times increased dramatically in an attempt to free up memory. It looks like there is a issue with the "Agent Heartbeat" from Introscope.
Not seeing any obvious evidence indicating this is related to the APM Agent.
The heap dump shows a heap of 1.5GB, with 77% occupied by a net.logstash.logback.util.ThreadLocalHolder, which holds an array containing 73k entries, each holding a weak reference to a thread.
85k threads in the heap dump, mostly already dead with name like "default task - nnn".
The problem does not seems to be caused by Introscope.
The following suggestions may help narrow down this issue: