Troubleshooting Healthwatch Grafana UI showing charts with "No data"

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

This article explains how to troubleshoot possible issues with Healthwatch Grafana UI.

Symptoms:
When accessing Healthwatch Grafana UI, one or more charts can show "No Data", instead of showing expected chart.

Environment

Product Version: 2.1,2.3.*

Resolution

Checklist:
In order to find out the source of the issue, go through the following steps.

"bosh ssh" into the 'tsdb' VM in the Healthwatch deployment.
Take a look at /var/vcap/jobs/prometheus/config/prometheus.yml which is the prometheus config file. Here you can find all the metrics prometheus scrapes for different exporters and all the access info. E.g: IP's, ports, and certs needed to access them.
Locate the exporter that may be failing to expose the missing metrics and write down the access information.
Send a curl request and check if the info is there or actually missing or port is block

Let's see an example. Let's say that the "Bosh Unresponsive Agent" chart is showing "No data". So we need to ssh into the tsdb VM in the Healthwatch deployment and take a look at /var/vcap/jobs/prometheus/config/prometheus.yml. Exploring the file will show a section for "director_direct_scrape" and director_system_direct_scrape that contains following info.

- job_name: director_direct_scrape
  metrics_path: /metrics
  scheme: https
  tls_config:
    server_name: 10.###.##.###
    ca_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem"
    cert_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem"
    key_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key"
  static_configs:
    - targets:
        - "10.###.##.###:9091"
- job_name: director_system_direct_scrape
  metrics_path: /metrics
  scheme: https
  tls_config:
    server_name: "system-metrics"
    ca_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem"
    cert_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem"
    key_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key"
  static_configs:
    - targets:
        - "10.#.#.#:53035"

With that information we can build the following curl.

curl -vk https://10.###.##.###:9091/metrics \
--cacert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem \
--cert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem \
--key /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key

curl -vk https://10.###.##.###:53035/metrics \ --cacert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem \ --cert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem \ --key /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key

From the output, we can see if any metric is missing and focus on the job that is not emitting them.

If the port is block by firewall you can ask firewall team to open port 9091 and 53035