Rabbitmq: some monitoring alerts tied to consumer performance might permanently trigger or fail
search cancel

Rabbitmq: some monitoring alerts tied to consumer performance might permanently trigger or fail

book

Article ID: 444235

calendar_today

Updated On:

Products

RabbitMQ

Issue/Introduction

After upgrading a RabbitMQ cluster to version 4.3.1 or later, monitoring alerts tied to consumer performance might permanently trigger or fail. Specifically, two core Prometheus metrics for Quorum Queues with active consumers abruptly drop to 0.0 and never recover.

 

Impacted Metrics

  • rabbitmq_detailed_queue_consumer_capacity

  • rabbitmq_detailed_queue_consumer_utilisation

Environment

Reproduction Steps

  1. Spin up a single-node or clustered RabbitMQ 4.3.1 broker.

  2. Create a durable Quorum Queue (e.g., demo-qq).

  3. Attach an active consumer to the queue.

  4. Scrape the Prometheus metrics endpoint or query the Management API.

 

 

Observed Result in RabbitMQ 4.3.1+:

# TYPE rabbitmq_detailed_queue_consumer_capacity gauge
# HELP rabbitmq_detailed_queue_consumer_capacity Consumer capacity
rabbitmq_detailed_queue_consumer_capacity{vhost="/",queue="demo-qq"} 0.0

# TYPE rabbitmq_detailed_queue_consumer_utilisation gauge
# HELP rabbitmq_detailed_queue_consumer_utilisation Same as consumer capacity
rabbitmq_detailed_queue_consumer_utilisation{vhost="/",queue="demo-qq"} 0.0

Expected (Legacy) Result in RabbitMQ 4.2.x: When a queue has active consumers but no current message flow (idle), these metrics previously returned 1.0 (100%):

rabbitmq_detailed_queue_consumer_utilisation{vhost="/",queue="demo-qq"} 1.0

Cause

In RabbitMQ 4.3.1, the underlying Raft implementation (the Ra library) was upgraded to version 3, migrating Quorum Queues to use rabbit_fifo version 8 (merged under PR #13885).

During this extensive rewrite, the background tick handler stopped writing usage statistics into the rabbit_fifo_usage ETS (Erlang Term Storage) table. When the upper-layer tracking module (rabbit_quorum_queue:handle_tick/3) attempts to read from this empty table via rabbit_fifo:usage/1, it safely defaults to 0.0.

Resolution

consumer_utilisation is a legacy metric designed originally for Classic Queues. Due to the vastly different Raft-based scheduling and internal architecture of Quorum Queues, this metric never accurately reflected true consumer utilization for QQs.

 

As part of code optimization and the deprecation of legacy behaviors in the 4.x release line, tracking for this metric was intentionally dropped for Quorum Queues.