Checklist:
In order to find out the source of the issue, go through the following steps.
Let's see an example. Let's say that the "Bosh Unresponsive Agent" chart is showing "No data". So we need to ssh into the tsdb VM in the Healthwatch deployment and take a look at /var/vcap/jobs/prometheus/config/prometheus.yml. Exploring the file will show a section for "director_direct_scrape" that contains following info.
- job_name: director_direct_scrape metrics_path: /metrics scheme: https tls_config: server_name: 10.###.##.### ca_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem" cert_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem" key_file: "/var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key" static_configs: - targets: - "10.###.##.###:9091"
With that information we can build the following curl.
curl -vk https://10.###.##.###:9091/metrics \ --cacert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_ca.pem \ --cert /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.pem \ --key /var/vcap/jobs/prometheus/config/certs/director_direct_scrape_certificate.key
From the output, we can see if any metric is missing and focus on the job that is not emitting them.