Metric Name Breaking Changes starting in Tanzu RabbitMQ for VMs Tile v2.x
search cancel

Metric Name Breaking Changes starting in Tanzu RabbitMQ for VMs Tile v2.x

book

Article ID: 293225

calendar_today

Updated On:

Products

VMware RabbitMQ

Issue/Introduction

This articles covers the Breaking Changes - Tanzu RabbitMQ for VMs v2.0.0  for metrics emitted by RabbitMQ Service Instances starting in Tile version 2.x. The two changes of interest are:

  1. Metric name format.
  2. Metric source (Prometheus plugin).
When upgrading from RabbitMQ Tile version 1.x to version 2.x, RabbitMQ metrics may go "missing." This article aims to address those "missing" metrics.

Environment

Product Version: 2.0

Resolution

Metric name format

There are changes to the formatting of metric names where the forward slashes become underscores in the metric names. 
#Prior to RabbitMQ v2.x
/on-demand-broker/p.rabbitmq/single-node/total_instances

#After upgrading to RabbitMQ v2.x+
_on_demand_broker_p_rabbitmq_single_node_total_instances

In addition to the formatting changes, some of the metric names themselves have changed to be more general while utilizing tags for specific details.

For example, consider this example firehose output for queue depth to see how the name has changed. The queue name is "Queue_Created_From_App3" in "service-instance_d26c2a27-5c03-47c6-8c0f-390d987ef624".
#Prior to RabbitMQ v2.x
origin:"p.rabbitmq" eventType:ValueMetric timestamp:1642091169930452500 deployment:"service-instance_d26c2a27-5c03-47c6-8c0f-390d987ef624" job:"rabbitmq-server" index:"18029ff7-cacc-40f8-8131-16e88ae92d92" ip:"172.36.2.13" tags:<key:"source_id" value:"d26c2a27-5c03-47c6-8c0f-390d987ef624" > valueMetric:<name:"/p.rabbitmq/rabbitmq/queues/d26c2a27-5c03-47c6-8c0f-390d987ef624/Queue_Created_From_App3/depth" value:100 unit:"count" >  
 

#After upgrading to RabbitMQ v2.x+
origin:"p.rabbitmq" eventType:ValueMetric timestamp:1644331519773763806 deployment:"service-instance_d26c2a27-5c03-47c6-8c0f-390d987ef624" job:"rabbitmq-server" index:"18029ff7-cacc-40f8-8131-16e88ae92d92" ip:"172.36.2.13" tags:<key:"instance_id" value:"rabbit@ec7a4fcc-7458-4264-a7e0-07f8f379a0c2.rabbitmq-server.service.service-instance-d26c2a27-5c03-47c6-8c0f-390d987ef624.bosh" > tags:<key:"queue" value:"Queue_Created_From_App3" > tags:<key:"source_id" value:"rabbit@localhost" > tags:<key:"vhost" value:"d26c2a27-5c03-47c6-8c0f-390d987ef624" > valueMetric:<name:"rabbitmq_queue_messages_ready" value:100 unit:"" >
 
Specifically, the following metric name has changed from:
/p.rabbitmq/rabbitmq/queues/d26c2a27-5c03-47c6-8c0f-390d987ef624/Queue_Created_From_App3/depth

The metric name now shows as:
rabbitmq_queue_messages_ready
 - tag <key:"queue" value:"Queue_Created_From_App3" >
 - tag <key:"vhost" value:"d26c2a27-5c03-47c6-8c0f-390d987ef624" >
 - ..etc

Note: Some metric ingestion systems may not append tags by default and may require a toggle to include tags when forwarding metrics downstream.

For example, the Splunk tile for Tanzu has this option. To enable the Splunk nozzles to include tags along with metrics sent to the downstream forwarders, this setting must be enabled in the tile.

It is also worth noting that queue metrics are aggregated by default starting in RabbitMQ tile version 2.x+. This means that the example metric rabbitmq_queue_messages_ready above would not be available by default on a "per queue" basis. We must enable "per queue" metrics. This is because RabbitMQ has switched to the prometheus plugin for its metrics. This brings us to breaking change 2.


Metric source (Prometheus plugin)

RabbitMQ has switched to using the Prometheus plugin (rabbitmq_prometheus) as its source for metrics. This plugin sets up a metric server within the RabbitMQ server. This metrics server is scraped by the prom_scraper job co-located on the RabbitMQ Server VM. The prom_scraper job then packages the metrics as envelopes to send to the loggregator for consumer ingestion.

The rabbitmq-server job provides a prom scraper config file for the prom_scraper job so that the prom_scraper job knows how to scrape the rabbit metrics server for metrics. Here is an example of a rabbitmq-server prom_scraper_config.yml file located at /var/vcap/jobs/rabbitmq-server/config/prom_scraper_config.yml on a RabbitMQ Server VM:
rabbitmq-server/ec7a4fcc-7458-4264-a7e0-07f8f379a0c2:/var/vcap/jobs/rabbitmq-server/config$ cat prom_scraper_config.yml
---
port: 15692
source_id: rabbit@localhost
instance_id: 'rabbit@ec7a4fcc-7458-4264-a7e0-07f8f379a0c2.rabbitmq-server.service.service-instance-e98381aa-c21a-48c0-afb8-d9d7ffa9de54.bosh'
scheme: http
server_name: localhost

labels:

  origin: p.rabbitmq

This config file directs the prom_scraper job to scrape http://localhost:15692 for metrics. You may curl this endpoint to see the metrics while on the rabbit server granted that the Prometheus plugin is enabled. There is a known issue where TLS Service Instances have a misconfigured prom_scraper_config.yml file that will be patched in RabbitMQ tile version 2.0.10. For more information about this issue, see the bottom of this article in section titled Known Issue for TLS enabled Service Instances.

For the prom_scraper to be able to scrape metrics from the RabbitMQ server, the rabbitmq_prometheus plugin must be enabled. This plugin is enabled by default in On-Demand service instances, however the Pre-Provisioned (shared) Service Instances may not have this enabled by default. Be sure that the rabbitmq_prometheus plugin is enabled in the tile:

Screen Shot 2022-02-08 at 10.46.22 AM.png


Now that we covered how metrics are obtained from the Rabbit server and emitted to the Loggregator, let's cover metric aggregation.

By default, "per queue" metrics are no longer emitted from a Rabbit Service Instance starting in RabbitMQ tile version 2.x+, instead they are all aggregated together as a whole. To enable "per queue" metrics again we must override the RabbitMQ server configuration to specify this in the plans. There are two types of overrides to choose from:
  • Expert Mode: Override Server Config - this is a sysctl-like format.
  • Expert Mode: Override Server Advanced Config - this is a base64 encoded string.

The configuration we need to override to allow "per queue" metrics is:
prometheus.return_per_object_metrics=true

The above string can be pasted in the Expert Mode: Override Server Config section of the On Demand plan.

Screen Shot 2022-02-08 at 10.50.57 AM.png


When this is enabled and applied we can see these changes reflected in a rabbitmq-server job configuration file on the RabbitMQ VM:
rabbitmq-server/ec7a4fcc-7458-4264-a7e0-07f8f379a0c2:/var/vcap/jobs/rabbitmq-server/etc/conf.d$ cat 50-overrideConfig.conf
prometheus.return_per_object_metrics=true

And to validate that the RabbitMQ Server is running with this configuration we can check the report:
rabbitmq-server/ec7a4fcc-7458-4264-a7e0-07f8f379a0c2:/var/vcap/jobs/rabbitmq-server/etc/conf.d$ sudo rabbitmqctl report | grep return_per_object_metrics
 {rabbitmq_prometheus,[{return_per_object_metrics,true}]},

By default, this setting is false. When we enable and apply it via the override, it flips it to true which enables "per queue" metrics again.

The Pre-Provisioned plan only has the Expert Mode: Override Server Advanced Config style of override so let us cover this also.

Essentially we provide the same data but in the required base64 format. This is all covered in the docs however an an example will be helpful. Let's consider that we want to enable the return_per_object_metrics=true and vm_memory_high_watermark=.6 overrides to the RabbitMQ Server configuration for Pre-Provisioned RabbitMQ.
$ echo '[{rabbit, [{vm_memory_high_watermark, 0.6}]},{rabbitmq_prometheus,[{return_per_object_metrics,true}]}].' | base64
W3tyYWJiaXQsIFt7dm1fbWVtb3J5X2hpZ2hfd2F0ZXJtYXJrLCAwLjZ9XX0se3JhYmJpdG1xX3By
b21ldGhldXMsW3tyZXR1cm5fcGVyX29iamVjdF9tZXRyaWNzLHRydWV9XX1dLgo=
Our base64 string for this configuration is:
W3tyYWJiaXQsIFt7dm1fbWVtb3J5X2hpZ2hfd2F0ZXJtYXJrLCAwLjZ9XX0se3JhYmJpdG1xX3By
b21ldGhldXMsW3tyZXR1cm5fcGVyX29iamVjdF9tZXRyaWNzLHRydWV9XX1dLgo=


For validation, to see it decoded:
$ echo 'W3tyYWJiaXQsIFt7dm1fbWVtb3J5X2hpZ2hfd2F0ZXJtYXJrLCAwLjZ9XX0se3JhYmJpdG1xX3By
> b21ldGhldXMsW3tyZXR1cm5fcGVyX29iamVjdF9tZXRyaWNzLHRydWV9XX1dLgo=' | base64 --decode
[{rabbit, [{vm_memory_high_watermark, 0.6}]},{rabbitmq_prometheus,[{return_per_object_metrics,true}]}].

Once we have this, we apply it to the Expert Mode: Override Server Advanced Config section:
Screen Shot 2022-02-08 at 11.05.59 AM.png


When this is enabled and applied we can see these changes reflected in a rabbitmq-server job configuration file on the RabbitMQ VM:
rabbitmq-server/22c4c623-7d48-4375-b3a5-33a5d73a37fb:/var/vcap/jobs/rabbitmq-server/etc$ cat advanced.config
[{rabbit, [{vm_memory_high_watermark, 0.6}]},{rabbitmq_prometheus,[{return_per_object_metrics,true}]}].

And to validate that the RabbitMQ Server is running with this configuration we can check the report:
rabbitmq-server/22c4c623-7d48-4375-b3a5-33a5d73a37fb:/var/vcap/jobs/rabbitmq-server/etc$ sudo rabbitmqctl report | egrep 'vm_memory_high_watermark|return_per_object_metrics'
      {vm_memory_high_watermark,0.6},
 {rabbitmq_prometheus,[{return_per_object_metrics,true}]},


For completeness we discussed doing all of this in the Ops Manager UI. If you wish to set these properties in a pipeline, the following may be referenced as property names:

Pre-Provisioned Expert Mode: Override Server Advanced Config:

.properties.multitenant_support.enabled.override_advanced_config


On-Demand Expert Mode: Override Server Config:

Plan 1 is always enabled so its property looks like:

.properties.on_demand_broker_plan_1_override_config


Plan 2 is optional so its property name is different:

.properties.on_demand_broker_plan_2_selector.enabled.override_config


On-Demand Expert Mode: Override Server Advanced Config:

Plan 1 is always enabled so its property looks like:

.properties.on_demand_broker_plan_1_override_advanced_config


Plan 2 is optional so its property name is different:

.properties.on_demand_broker_plan_2_selector.enabled.override_advanced_config


Conclusion

There have been quite a few changes regarding way RabbitMQ on TAS for VMs provides metrics. Upgrading from RabbitMQ tile version 1.x to 2.x may result in "missing" metrics. To get these metrics back we must be aware of the following:

  • The metric name format changes.
  • RabbitMQ metrics source has changed to the prometheus plugin. The rabbitmq_prometheus plugin must be enabled on the RabbitMQ Server for metrics to be available. This plugin is enabled by default on On-Demand Service Instances, but may not be on by default for the Pre-Provisioned Service Instances. 
  • The prom_scraper job is responsible for scraping the RabbitMQ metrics server to obtain metrics and make available to the loggregator.
  • RabbitMQ metrics are aggregated by default and we must override the RabbitMQ Server configuration with return_per_object_metrics=true to enable the metrics to be on a "per queue' basis.
  • There is a bug where the prom_scraper is misconfigured for TLS enabled RabbitMQ Service Instances that is patched in RabbitMQ tile version 2.0.10.


Note

Queue specific information is contained within envelope tags. This means that log ingestion systems should also forward envelope tags downstream along with the envelope. For example, starting with the Splunk Nozzle v1.2.3 there is a toggle that allows adding tags to envelopes (it is false by default).

 


Known Issue for TLS enabled Service Instances

The misconfiguration is the prom_scraper config port is specified as 15692 for TLS enabled Service Instances but should be 15691 instead. This has been patched and is now live on Tanzunet. RabbitMQ tile version 2.0.10 was released on 2-18-2022. Due to this misconfiguration, metrics from RabbitMQ TLS Service Instances will be unavailable until upgrading to tile version 2.0.10 or manually editing this config file and monit restarting the prom_scraper job so that it picks up the new configuration. Unfortunately this change would only be temporary and would be reverted if the Service Instance is updated. If you have a need to do this for an important Service Instance before upgrading, please contact Tanzu Support.