Monitoring Persistent Volume Usage in VKS via Prometheus

Products

VMware vSphere Kubernetes Service VMware Tanzu Kubernetes Grid Management

Issue/Introduction

After deploying the Prometheus package in a VKS guest cluster, administrators may find that Persistent Volume (PV) metrics, specifically kubelet_volume_stats_used_bytes—return an "Empty query result" in the Prometheus UI. This occurs even though container metrics (CPU/Memory) are successfully scraped from the /metrics/cadvisor endpoint.

Prerequisites:

General Requirements for Installing Packages on TKG Service clusters

Manage Package Repository

Install Cert Manager

Install Contour with Envoy

Prometheus Package Reference

Environment

Vsphere Kubernetes Service

Tanzu kubernetes Gird Management

Cause

The Kubelet architecture separates metrics into distinct endpoints based on the data source:

/metrics (Root Endpoint): This is the primary endpoint for Kubelet-specific infrastructure metrics, including Persistent Volume (PV) statistics (kubelet_volume_stats_*) and node health data.
/metrics/cadvisor Endpoint: This endpoint is dedicated to container-level resource usage metrics (CPU, memory, and internal container filesystem usage) provided by the embedded cAdvisor tool.

If a Prometheus Scrape Job is configured only for the /metrics/cadvisor path, it will not capture the volume utilization data because that specific data is managed by the Kubelet Volume Manager and exposed exclusively via the root /metrics path.

Resolution

Step 1: Verify Metric Availability on the Node

Confirm the data is available at the Kubelet level using the kubectl raw API proxy. Replace <node-name> with a worker node in your cluster:

kubectl get --raw "/api/v1/nodes/<node-name>/proxy/metrics" | grep kubelet_volume_stats

Expected Output: You should see lines starting with kubelet_volume_stats_used_bytes followed by the PVC name and namespace.

kubelet_volume_stats_available_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 1.91619072e+09
kubelet_volume_stats_available_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 1.916198912e+09
kubelet_volume_stats_capacity_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 2.040373248e+09
kubelet_volume_stats_capacity_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 2.040373248e+09
kubelet_volume_stats_inodes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131072
kubelet_volume_stats_inodes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131072
kubelet_volume_stats_inodes_free{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131059
kubelet_volume_stats_inodes_free{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 131059
kubelet_volume_stats_inodes_used{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 13
kubelet_volume_stats_inodes_used{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 13
kubelet_volume_stats_used_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 32768
kubelet_volume_stats_used_bytes{namespace="<NAMESPACE>",persistentvolumeclaim="<PVC_NAME>"} 24576

Step 2: Configure a New Scrape Job for Volume Data

To capture the missing metrics, you must update your Prometheus configuration to target the root Kubelet metrics path.

A. Update the Configuration File Obtain the values.yaml file used to deploy your Prometheus package (e.g., final-prometheus-data-values.yaml) and add the following job snippet under the prometheus.config.prometheus_yml.scrape_configs section:

- job_name: kubernetes-kubelet
  kubernetes_sd_configs:
  - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    target_label: __metrics_path__
    replacement: /api/v1/nodes/$1/proxy/metrics

Job Configuration Details:

Job Name: kubernetes-kubelet
Metrics Path: /api/v1/nodes/$1/proxy/metrics (Targets the root Kubelet endpoint via the API server proxy)

B. Apply the Update via Tanzu CLI Once the prometheus-data-values.yaml is updated with the new job configuration, apply the changes to the cluster using the following command:

tanzu package installed update -n vmware-system-tkg prometheus \
  --version 3.5.0+vmware.1-vks.2 \
  --values-file prometheus-data-values.yaml

C. Verify Reconciliation Monitor the terminal output to ensure the package reconciles successfully.

Step 3: Post-Update Verification

Check Targets: Log in to the Prometheus UI and navigate to Status -> Targets. You should now see two distinct node-related jobs:

kubernetes-nodes-cadvisor (Scraping /metrics/cadvisor)

kubernetes-kubelet (Scraping /metrics via the API proxy)

Verify State: Ensure the new kubernetes-kubelet job shows a state of UP.

Run Query: Execute the following query to confirm data is now flowing: kubelet_volume_stats_used_bytes