Error: metrics not available yet when deploying metrics-server on a Tanzu Kubernetes Grid (TKG) workload cluster
search cancel

Error: metrics not available yet when deploying metrics-server on a Tanzu Kubernetes Grid (TKG) workload cluster

book

Article ID: 297297

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

Description

After waiting for some amount of time (5-10 mins) depending upon the size of the workload cluster, following error is continuously reported when trying to fetch metrics for pods or nodes:

  • Metrics for nodes: "error: metrics not available yet"
  • Metrics for pods: "error: Metrics not available for pod"


This issue may present the following symptoms:

  • Symptom 1 - DNS (kube-dns) is not able to lookup IP addresses of the cluster nodes. In this case, the logs from the metrics server pod will show this:

    "unable to fetch node metrics for node <node-name>: no metrics known for node"
    "unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary", "dial tcp: lookup <node-name> on <node-IP>:53: no such host"
​​​​​​
  • Symptom 2 - If the DNS resolution is working fine, another issue could occur where validation of certificate could fail. In this case, the logs from the metrics server pod will show this:

    "x509: cannot validate certificate for <IP> because it doesn't contain any IP SANs"

Resolution

Both the DNS resolution and the certificate validation issues (Symptoms #1 and #2) can be resolved by editing the metrics-server deployment and adding the following flags in the "args" property:

  • For resolving DNS errors, add --kubelet-preferred-address-types=InternalIP to metrics server deployment.
  • For resolving certificate validate errors, we can skip ssl validation using the following flag, add --kubelet-insecure-tls to metrics server deployment.

Example 

Using kubectl, edit the command below:

kubectl -n kube-system edit deployment metrics-server


Add the following two arguments and save the changes:



After the changes gets applied to the cluster, wait for 2-3 mins for the metrics to be fetched. You can check if metrics-server is working by trying to get the metrics for nodes or pods using kubectl top node or kubectl top pods command. A successful output will be something similar to this: