Issues in data collection for Kafka metrics

search cancel

Issues in data collection for Kafka metrics

book

Article ID: 408897

calendar_today

Updated On:

Products

DX SaaS DX OI SaaS DX Operational Intelligence DX APM SaaS DX Application Performance Management

Issue/Introduction

It is been observed that there is an issues in data collection for Kafka metrics and it wont show data for Kafka. Restarting the Cluster-performance-prometheus-ng-XXXX pod did not help to get the metrices .
Here Kafka metrices is getting exposed to Prometheus and the going to OI.

The issue actually started after the kafka pod restarted. In this environment when the kafka pod restarted then new ip got assigned to kafka pod, but from UMA side it was still taking the old ip, Though few brokers are working fine .
From the cluster-performance-prometheus-ng-XXXXXX -prometheus-exporter.log

Error message is

level=warning msg="scraping prometheus metrics: http://yy,yy.yy.yy:8080/metrics (network-metrics-daemon-wfjgl)" component=Scraper error="Get \"http://yy.yy.yy.yy:8080/metrics\": dial tcp yy.yy.yy.yy:8080: connect: connection refused"
time=<" timestamp" >level=warning msg="error while scraping target" component=Scraper error="Get \"http://yy.yy.yy. yy:8080/metrics\": dial tcp yy.yy.yy.yy 8080: connect: connection refused"
time="2025-08-19T06:37:42Z" level=warning msg="scraping prometheus metrics: https://zz,zz,zz,zz:8080/metrics( hostname.company.com )" component=Scraper error="Get \"https://zz.zz.zz.zz:8080/metrics\": dial tcp zz.zz.zz.zz:8080: connect: connection refused"

Environment

UMA agent 25,x

DX SAAS

Resolution

Validations done on problematic environment :

Validated the logs of prometheus-ng pod and still has the old IP address of kafka clusters and errors are being seen while scraping the end-points
Identified errors in clusterinfo pod while fetching the information about the resources. Looks like this has caused the issue, because clusterinfo is not giving the latest information about the k8s objects.

Steps performed at problematic environment :

Restarted the clusterinfo pod and also the prometheus pod and observed there was no scraping errors for kafka clusters and all the data of kafka clusters being pushed.

Feedback

thumb_up Yes

thumb_down No