Troubleshooting:
ssh to one of TSDB VMs and run this curl command by replacing __BOSH_DIRECTOR_IP__ with actual bosh director IP
curl -vk https://__BOSH_DIRECTOR_IP__:53035/metrics --cacert /var/vcap/jobs/prometheus/config/certs/prometheus_ca.pem --cert /var/vcap/jobs/prometheus/config/certs/prometheus_certificate.pem --key /var/vcap/jobs/prometheus/config/certs/prometheus_certificate.key
# TYPE system_cpu_core_idle gauge system_cpu_core_idle{cpu_name="cpu0",deployment="p-bosh-<GUID>”,index="fdcc400c-****-****-****-************”,ip="",job="loggr-system-metrics-agent",origin="system_metrics_agent",source_id="system_metrics_agent",unit="Percent"} 99.53208556145128 system_cpu_core_idle{cpu_name="cpu1",deployment="p-bosh-<GUID>",index="fdcc400c-****-****-****-************",ip="",job="loggr-system-metrics-agent",origin="system_metrics_agent",source_id="system_metrics_agent",unit="Percent"} 98.32775919731921 ......
Root Cause:
Prior to Ops Manager 1.8.4, the BOSH product had an ID of p-bosh-GUID. After OM 1.8.4, new installations had an ID of just p-bosh. Currently Bosh director deployment name should be p-bosh for Healthwatch2.x , but any customer that has upgraded an environment pre-OM 1.8.4 to today would still retain the p-bosh-GUID identifier. That’s why the query in grafana returns 'No data' due to p-bosh-GUID.
This ID issue of p-bosh-GUID is planning to be fixed in the next regular release for HealthWatch around the beginning of June, 2023.
Temporary workaround:
You can clone the dashboard and modify the query like this: system_healthy{deployment=~"p-bosh.*"}.
Notice: the cloned dashboard will be deleted after each reboot of Grafana VM, that’s why it’s a temporary workaround.
1. click 'dashboard setting'
2. click 'Save as'
3. save new dashboard
4. edit new dashboard
5. click 'Edit'
6. click add Query
7. add system_healthy{deployment=~"p-bosh.*"} and click 'Refresh dashboard'
8. click 'Apply'
9. Bosh director status is Running.