How to scrape BOSH Director system metrics via a Prometheus endpoint
search cancel

How to scrape BOSH Director system metrics via a Prometheus endpoint

book

Article ID: 293818

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Although TAS operators can view BOSH Director stats (CPU / memory / disk usage) on the Ops Manger web console (Director tile -> Status), sometimes they still need to scrape those metrics and forward them to external syslog systems (e.g. Splunk, Prometheus). In old Ops Manager releases, there is no way to retrieve system metrics from the BOSH Director. Since Ops Manager v2.10.9+, a job called loggr-system-metrics-agent became available on BOSH Director which will collect system metrics and make them available on a Prometheus endpoint.  However, those system metrics are currently not forwarded to a TAS loggregator component, making it not possible to retrieve them from firehose with a nozzle.

This article provides a temporary way to scrape BOSH Director system metrics that have been exposed by loggr-system-metrics-agent.

Environment

Product Version: 2.10

Resolution

The loggr-system-metrics-agent exposes local system metrics on a Prometheus-scrapable endpoint at port 53035. Curling the "/metrics" endpoint will retrieve the system metrics:

bosh/0:~# curl -k https://localhost:53035/metrics --cacert /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent_ca.crt --key /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent.key --cert /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent.crt
# HELP system_cpu_core_idle vm metric
# TYPE system_cpu_core_idle gauge
system_cpu_core_idle{cpu_name="cpu0",deployment="p-bosh",index="0379a4cf-8bb8-4412-6129-c52f757a0507",ip="",job="loggr-system-metrics-agent",origin="system_metrics_agent",source_id="system_metrics_agent",unit="Percent"} 95.72431357745269
system_cpu_core_idle{cpu_name="cpu1",deployment="p-bosh",index="0379a4cf-8bb8-4412-6129-c52f757a0507",ip="",job="loggr-system-metrics-agent",origin="system_metrics_agent",source_id="system_metrics_agent",unit="Percent"} 95.79949922506043
.......


You can also curl the "/metrics" endpoint through the BOSH Director external IP in another location on the network. 

If you have deployed Prometheus in your environment, you can configure Prometheus to scrape BOSH Director system metrics via the exposed endpoint.

Healthwatch v2 will deploy a Prometheus job on tsdb instances, however it doesn't scrape system metrics from BOSH Director at the moment (this feature is in the product team's plan).

As a workaround, Prometheus in Healthwatch v2 can be manually configured to scrape the BOSH Director system metrics. 

1. Get the following certificate & key files from the BOSH Director:

  • /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent_ca.crt
  • /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent.key
  • /var/vcap/jobs/loggr-system-metrics-agent/config/certs/system_metrics_agent.crt


2. Add a new job on the Healthwatch2 tile (Settings -> Prometheus Configuration -> Additional Scrape Config Jobs) with the following info:

job_name: bosh-director
metrics_path: /metrics
scheme: https
static_configs:
  - targets:
    - "<BOSH Director external IP>:53035"


Replace <BOSH Director external IP> with the BOSH Director external IP address. Fill in the certificate and private key sections with the content of files retrieved in the previous step. Check the "TLS Config Skip SSL Validation" check-box.

3. Run "Apply Changes" against the Healthwatch2 tile.

4. After the "Apply Changes" completes successfully for a while, the BOSH Director system metrics will be scraped by Prometheus and become visible on Grafana. You can add a new dashboard with several panels to show those metrics. For example, add a panel with the following query to show system_cpu_user:

system_cpu_user{origin="system_metrics_agent", deployment=~"p-bosh"} 

bosh-metrics

A list of system metrics exposed by the endpoint is available in these docs.