Monitoring Persistent Volume Usage in VKS via Prometheus
search cancel

Monitoring Persistent Volume Usage in VKS via Prometheus

book

Article ID: 427012

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service VMware Tanzu Kubernetes Grid Management

Issue/Introduction

After deploying the Prometheus package in a VKS guest cluster, administrators may find that Persistent Volume (PV) metrics, specifically kubelet_volume_stats_used_bytes—return an "Empty query result" in the Prometheus UI. This occurs even though container metrics (CPU/Memory) are successfully scraped from the /metrics/cadvisor endpoint.

Prerequisites:

General Requirements for Installing Packages on TKG Service clusters

Manage Package Repository

Install Cert Manager

Install Contour with Envoy

Prometheus Package Reference

 

Environment

Vsphere Kubernetes Service 

Tanzu kubernetes Gird Management 

Cause

The Kubelet architecture separates metrics into distinct endpoints based on the data source:

  • /metrics (Root Endpoint): This is the primary endpoint for Kubelet-specific infrastructure metrics, including Persistent Volume (PV) statistics (kubelet_volume_stats_*) and node health data.

  • /metrics/cadvisor Endpoint: This endpoint is dedicated to container-level resource usage metrics (CPU, memory, and internal container filesystem usage) provided by the embedded cAdvisor tool.

If a Prometheus Scrape Job is configured only for the /metrics/cadvisor path, it will not capture the volume utilization data because that specific data is managed by the Kubelet Volume Manager and exposed exclusively via the root /metrics path.

Resolution

Step 1: Verify Metric Availability on the Node

Confirm the data is available at the Kubelet level using the kubectl raw API proxy. Replace <node-name> with a worker node in your cluster:

kubectl get --raw "/api/v1/nodes/<node-name>/proxy/metrics" | grep kubelet_volume_stats


Expected Output: You should see lines starting with kubelet_volume_stats_used_bytes followed by the PVC name and namespace.

kubelet_volume_stats_available_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 1.91619072e+09
kubelet_volume_stats_available_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 1.916198912e+09
kubelet_volume_stats_capacity_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 2.040373248e+09
kubelet_volume_stats_capacity_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 2.040373248e+09
kubelet_volume_stats_inodes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131072
kubelet_volume_stats_inodes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131072
kubelet_volume_stats_inodes_free{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131059
kubelet_volume_stats_inodes_free{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131059
kubelet_volume_stats_inodes_used{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 13
kubelet_volume_stats_inodes_used{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 13
kubelet_volume_stats_used_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 32768
kubelet_volume_stats_used_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 24576


Step 2: Configure a New Scrape Job for Volume Data

To capture the missing metrics, you must update your Prometheus configuration to target the root Kubelet metrics path.

A. Update the Configuration File Obtain the values.yaml file used to deploy your Prometheus package (e.g., final-prometheus-data-values.yaml) and add the following job snippet under the prometheus.config.prometheus_yml.scrape_configs section:

- job_name: kubernetes-kubelet
  kubernetes_sd_configs:
  - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    target_label: __metrics_path__
    replacement: /api/v1/nodes/$1/proxy/metrics
   

 

Job Configuration Details:

  • Job Name: kubernetes-kubelet

  • Metrics Path: /api/v1/nodes/$1/proxy/metrics (Targets the root Kubelet endpoint via the API server proxy)


B. Apply the Update via Tanzu CLI Once the  prometheus-data-values.yaml is updated with the new job configuration, apply the changes to the cluster using the following command:

tanzu package installed update -n vmware-system-tkg prometheus \
  --version 3.5.0+vmware.1-vks.2 \
  --values-file prometheus-data-values.yaml

 

C. Verify Reconciliation Monitor the terminal output to ensure the package reconciles successfully.

 

Step 3: Post-Update Verification

Check Targets: Log in to the Prometheus UI and navigate to Status -> Targets. You should now see two distinct node-related jobs:

kubernetes-nodes-cadvisor (Scraping /metrics/cadvisor)

kubernetes-kubelet (Scraping /metrics via the API proxy)

Verify State: Ensure the new kubernetes-kubelet job shows a state of UP.

Run Query: Execute the following query to confirm data is now flowing: kubelet_volume_stats_used_bytes