After upgrading Healthwatch to version 2.3.4 or later, system health metrics for the Tanzu Kubernetes Grid Integrated Edition (TKGI) control plane may no longer appear in Grafana dashboards or Prometheus query results.
Specifically, the following metrics return no data:
system_healthy{exported_job=~"pivotal-container-service"}
system_healthy{exported_job=~"pks-db"}
Healthwatch
TKGI
Starting with Healthwatch version 2.3.4, the automated collection of TKGI Control Plane health metrics (API and Database) has been logically linked to the TKGI Cluster Discovery feature.
If TKGI Cluster Discovery is not enabled in the Healthwatch tile settings, the default scrape configuration for these specific components is automatically omitted from the Prometheus runtime configuration during the deployment process. This results in the loss of the system_healthy metrics that were previously collected by default in versions 2.3.3 and earlier.
To restore these metrics persistently, define a manual scrape job within the Healthwatch Tile. This ensures the configuration is preserved across BOSH redeploys.
Log in to Ops Manager and open the Healthwatch tile.
Navigate to the Prometheus tab and scroll to Additional Scrape Jobs. Click Add.
Fill in the fields as follows:
Scrape job configuration parameters: paste the following configuration.
job_name: master_system_metrics_agent_direct_scrapes
metrics_path: /metrics
scheme: https
tls_config:
server_name: "system-metrics"
ca_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem"
cert_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem"
key_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key"
dns_sd_configs:
- names:
- q-s4.*.*.*.bosh.
type: A
port: 53035
(Note: Do not include a leading dash - before job_name.)
Certificate PEM & Private Key PEM: Paste the client certificate and private key.
CA certificate for TLS: Paste the CA certificate.
Target server name: Enter system-metrics.
Skip TLS certificate verification: Leave Unchecked (False).
Click Save.
Return to the Installation Dashboard and click Apply Changes for the Healthwatch tile.
You can check the /var/vcap/jobs/prometheus/config/certs/ directory on the TSDB VM to copy the director certs.