AIOps - kafka data consuming all disk space in Elastic nodes
search cancel

AIOps - kafka data consuming all disk space in Elastic nodes


Article ID: 222125


Updated On:


DX Operational Intelligence DX Application Performance Management CA App Experience Analytics



1) Spikes in OI Metric Publisher and Metadata


2) Jarvis kafka size keeps growing very quickly consuming all disk space in Elastic nodes

From Elastic nodes, /dxi/jarvis/kafka folder

du -h --max-depth=1 | sort -h

After 24 hours, size of files increased more than 100%

What is the purpose of this data? How can we reduce the size? Why data is not processed? How to fix the problem?



DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x


apmservices-OIMetricPublisher writes to the kafka topic 1MinMetrics*.

This topic is not read by Jarvis but is processed by dsp-integrator component for anomaly detection. The number of metrics exported to this topic is controlled through oimetricpublisher regex configurations and is probably too large generating so much metrics data

You can identify this condition from Cluster Management > Metrics View > apmservices | oimetricpublisher | 001 | 0I Metric Publisher : Metrics Processed Per Interval

In this example, you can see that the amount of metrics exported to "1MinMetrics" topic is more than million causing the issue



1) Change oimetricpublisher configuration (from Cluster Manager) to reduce the number of metrics configured 

Go to Cluster Manager (login as masteradmin)

Go to Cluster Settings, locate the below properties:




Update value

Business Segment\|.*|By Business Service\|.*|Frontends\|Apps\|.*|By Frontend\|[^|]+\|Health:.*|CPU\|Processor.*:Utilization % \(aggregate\)|CPU:Utilization % \(process\)|GC Monitor.*|GC Heap.*|(.*)\|(Business Process|Business Service)\|(.*)\|Business Transactions\|(.*):(.*)|EJB\|(.*):Average Method Invocation Time \(ms\)|Backends(.*)|Frontends\|Messaging Services(.*)|JNDI(.*)|WebServices(.*)|Threads(.*)|Oracle Databases(.*) 


CPU\|Processor.*:Utilization % \(aggregate\)|CPU:Utilization % \(process\)|GC Monitor.*|GC Heap.*|


Apply the same change for the above 3 properties

2) Reduce the kafka retention from default 24 hrs to 12 hrs or smaller for 1minMetrics topic

a) connect to a kafka pod
If you are using Openshift, go to the Openshift console | Applications | Pods | <kafka pod> | Terminal
Otherwise, you can ssh any of the kafka pod:

kubectl get pods -n<dxi-namespace> | grep kafka
kubectl exec -ti <kafka-pod> sh -n<dxi-namespace>

b) First reduce retention for the topic to a very small number to delete the existing kafka data

/opt/ca/kafka/bin/ --zookeeper jarvis-zookeeper:2181 --alter --topic 1minMetrics --config

Verify the change has been applied successfully:

/opt/ca/kafka/bin/ --zookeeper jarvis-zookeeper:2181 describe --topic 1minMetrics

wait for 10 to 15 minutes, then check that kafka size has been reduced using: du -h --max-depth=1 | sort -h

for example:

c) finally, set retention to 4 hours (default 24 hours)

/opt/ca/kafka/bin/ --zookeeper jarvis-zookeeper:2181 --alter --topic 1minMetrics --config

Verify the change has been applied successfully using: /opt/ca/kafka/bin/ --zookeeper jarvis-zookeeper:2181 describe --topic 1minMetrics

Additional Information

DX Platform - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting