Healthwatch (v2.3.4+) Missing API and DB Health Metrics for TKGI
search cancel

Healthwatch (v2.3.4+) Missing API and DB Health Metrics for TKGI

book

Article ID: 435224

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

After upgrading Healthwatch to version 2.3.4 or later, system health metrics for the Tanzu Kubernetes Grid Integrated Edition (TKGI) control plane may no longer appear in Grafana dashboards or Prometheus query results.

Specifically, the following metrics return no data:

  • system_healthy{exported_job=~"pivotal-container-service"}

  • system_healthy{exported_job=~"pks-db"}

Environment

Healthwatch
TKGI

Cause

Starting with Healthwatch version 2.3.4, the automated collection of TKGI Control Plane health metrics (API and Database) has been logically linked to the TKGI Cluster Discovery feature.

If TKGI Cluster Discovery is not enabled in the Healthwatch tile settings, the default scrape configuration for these specific components is automatically omitted from the Prometheus runtime configuration during the deployment process. This results in the loss of the system_healthy metrics that were previously collected by default in versions 2.3.3 and earlier.

Resolution

To restore these metrics persistently, define a manual scrape job within the Healthwatch Tile. This ensures the configuration is preserved across BOSH redeploys.

Steps to Restore Metrics

  1. Log in to Ops Manager and open the Healthwatch tile.

  2. Navigate to the Prometheus tab and scroll to Additional Scrape Jobs. Click Add.

  3. Fill in the fields as follows:

  • Scrape job configuration parameters: paste the following configuration.

    job_name: master_system_metrics_agent_direct_scrapes
    metrics_path: /metrics
    scheme: https
    tls_config:
      server_name: "system-metrics"
      ca_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem"
      cert_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem"
      key_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key"
    dns_sd_configs:
    - names:
        - q-s4.*.*.*.bosh.
      type: A
      port: 53035

    (Note: Do not include a leading dash - before job_name.)

  • Certificate PEM & Private Key PEM: Paste the client certificate and private key.

  • CA certificate for TLS: Paste the CA certificate.

  • Target server name: Enter system-metrics.

  • Skip TLS certificate verification: Leave Unchecked (False).

  1. Click Save.

  2. Return to the Installation Dashboard and click Apply Changes for the Healthwatch tile.

 

Additional Information

You can check the /var/vcap/jobs/prometheus/config/certs/ directory on the TSDB VM to copy the director certs.