Wavefront collector is not able to scrape prometheus metrics from CoreDNS pods in a TKGI cluster
search cancel

Wavefront collector is not able to scrape prometheus metrics from CoreDNS pods in a TKGI cluster

book

Article ID: 298724

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

This issue was reported in an environment with the following version information:
  • Ops Manager - v2.10.16-build.269
  • TKGI - v1.11.3
  • Wavefront collector - wavefronthq/wavefront-kubernetes-collector:1.3.4
  • Wavefront proxy - wavefronthq/proxy:9.7
The wavefront-collector pods report the following error in the logs:
time="2021-09-19T21:30:25Z" level=error msg="Error in scraping containers from 'prometheus_source: http://10.100.200.2:9153/metrics': Get \"http://10.100.200.2:9153/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"


How to identify if wavefront-collector is not able to scrape prometheus metrics from CoreDNS pods?

Based on the error, you can identify the following details:

  • 10.100.200.2 - this is the service IP address of the kube-dns service in the kube-system namespace
  • kubectl -n kube-system gets pod <coredns-podname> -oyaml - shows that each coredns container is configured with a TCP port called metrics with container port 9153. This container port exposes Prometheus metrics on the container.
  • kubectl get svc -n kube-system kube-dns -o json | jq .spec.ports - shows the kube-dns service does not know how to traffic received on port 9153. For reference - the initial ports array looks like the following:
[
  {
    "name": "dns",
    "port": 53,
    "protocol": "UDP",
    "targetPort": 53
  },
  {
    "name": "dns-tcp",
    "port": 53,
    "protocol": "TCP",
    "targetPort": 53
  }
]


Environment

Product Version: 1.11
OS: Linux

Resolution

Note - If you are on v1.11.5 & greater or v1.12.x, this issue is already fixed in those releases. The following instructions are for working around this issue if you are not on a TKGI release where this issue has not been fixed.
  • Edit kube-dns service (kubectl edit svc -n kube-system kube-dns) and add the following port entry in the ports array under spec:
- name: metrics
   port: 9153
   protocol: TCP
   targetPort: 9153
  • After saving the changes, the wavefront-collector will be able to scrape metrics from the coredns pods and the errors will go away. For reference, this is how the kube-dns service ports array would look like after applying the workaround:
kubectl get svc -n kube-system kube-dns -o json | jq .spec.ports
[
  {
    "name": "dns",
    "port": 53,
    "protocol": "UDP",
    "targetPort": 53
  },
  {
    "name": "dns-tcp",
    "port": 53,
    "protocol": "TCP",
    "targetPort": 53
  },
  {
    "name": "metrics",
    "port": 9153,
    "protocol": "TCP",
    "targetPort": 9153
  }
]