System metrics from BOSH deployed VMs become unavailable on Grafana post upgrade to Healthwatch v2.2.9
search cancel

System metrics from BOSH deployed VMs become unavailable on Grafana post upgrade to Healthwatch v2.2.9

book

Article ID: 298155

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

After Healthwatch tile was upgraded to v2.2.9, system metrics from BOSH deployed VMs in a particular isolation segment become unavailable on Grafana dashboard. System metrics from BOSH deloyed VMs in other deployments were still visible. As suggested in the breaking change in release noteEnable System Metrics option in the Director Config tab of the BOSH Director tile was already checked. 

[Breaking Change] Healthwatch now requires the use of system-metrics-agent processes to gather “system” metrics from BOSH deployed VMs. Please make sure to enable Enable System Metrics in the Director Config tab of the BOSH Director tile. Ensure that Apply Changes is run on all tiles and that all VMs deployed via service brokers are upgraded prior to upgrading Healthwatch. The impact of not performing this action is that Healthwatch dashboards will fail to populate with metrics about VM health, cpu, memory, disk and other statistics. 

Environment

Product Version: 4.0

Resolution

According to System Metrics Agents architecture, the following describes the components of a Loggregator deployment that uses System Metrics Agents, as shown in the below diagram:
  • System Metrics Agent: A standalone agent to provide VM system metrics using a Prometheus-scrapeable endpoint.

  • System Metrics Scraper: The System Metrics Scraper forwards metrics from System Metrics Agents to Loggregator Agents over mTLS.


images-architecture-system-metrics-agents.png

So if system-metrics-agent job is running well on BOSH deployed VMs experiencing the problem, it's likely that System Metrics Scraper was not able to scrape system metrics from the VMs, which will be exposed at port 53035 on each VM. On a TAS foundation System Metrics Scraper will be deployed on clock_global instance as job loggr-system-metric-scraper. Therefore, review logs of job loggr-system-metric-scraper for any error about scraping metrics from system-metrics-agent. 

For this particular case, the affected isolation segment was set to a dedicated subnet and incoming traffic to port 53035 for this subnet was blocked by firewall. So the related firewall rule should be modified to allow the incoming traffic.