We have observed numerous issues and instability since the upgrade to 24.10.
1) In the 21.6 version of the Agent, we used the following setting to capture JMX metrics from our various Kafka instances:
introscope.agent.kafka.jmx.include.filter=confluent-authorizer-metrics:*;kafka.admin.client:*;kafka.cluster:*;kafka.consumer:*;kafka.controller:*;kafka.coordinator.group:*;kafka.coordinator.transaction:*;kafka.databalancer:*;kafka.log:*;kafka.network:*;kafka.producer:*;kafka.rest:*;kafka.security:*;kafka.server:*;kafka.tier:*;kafka.tier.tasks:*;kafka.tier.tasks.archive:*;kafka.tier.tasks.delete:*;kafka.utils:*;java.lang:type=GarbageCollector,name=G1 Young Generation;java.lang:type=GarbageCollector,name=G1 Old Generation;java.lang:type=MemoryPool,name=G1 Old Gen;java.lang:type=MemoryPool,name=G1 Eden Space;java.lang:type=OperatingSystem;java.lang:type=Threading;org.apache.ZooKeeperService:*
When trying to replicate this same JMX filter under the 24.10 version, the Agent JVM quickly runs out of heap (despite increasing the heap size to 2.5 GB).
2) To attain some stability based on the issue in point #1, reduced the filter to the following:
introscope.agent.kafka.broker.broker1.jmx.include.filter=kafka.server:*;
with the following filter:
introscope.agent.kafka.broker.broker1.jmx.exclude.filter=kafka.server:type=Request,client-id=*;kafka.cluster:type=Partition,name=CaughtUpReplicasCount,*;kafka.cluster:type=Partition,name=DeferredUnderMinIsr,*;kafka.cluster:type=Partition,name=BlockedOnMirrorSource,*;kafka.cluster:type=Partition,name=LastStableOffsetLag,*;kafka.cluster:type=Partition,name=MirrorReplicasCount,*;kafka.cluster:type=Partition,name=UnderMinIsrMirror,*;kafka.cluster:type=Partition,name=UnderReplicatedMirror,*;
This reduced the heap memory consumption to a reasonable rate, however, metrics are extremely sporadic.
Set introscope.agent.remotejmx.softConfigSync.interval.seconds=0 and turn off debug logging