High CPU usage on idle RabbitMQ nodes after upgrading from version 3.12.x to 3.13.x

search cancel

High CPU usage on idle RabbitMQ nodes after upgrading from version 3.12.x to 3.13.x

book

Article ID: 393112

calendar_today

Updated On: 04-03-2025

Products

Services Suite

Issue/Introduction

High CPU usage on idle RabbitMQ nodes was observed after upgrading from version 3.12.x to 3.13.x with the monitoring tools.

Environment

All RabbitMQ versions 3.13 and above

Cause

RabbitMQ 3.13.x has a different distribution of queue replicas, federation links, shovels, and the HTTP API clients (or Prometheus scrapers) that hit specific nodes. This can contribute to an increase in CPU usage between RabbitMQ versions 3.12.x and 3.13.x.

Nodes without client connections can still host queue and stream replicas, handle HTTP API requests, host shovels, and federation links, and so on. The Idle nodes might still handle background tasks (HTTP API, federation links, metrics collection) that consume CPU.

Resolution

A couple of general recommendations can be applied to "moderately loaded" systems where a large percentage of most connections and queues can go idle from time to time. Such system often can reduce their CPU footprint with a few straightforward steps. These recommendations can significantly decrease CPU footprint with some workloads.

1. CPU usage is by definition very workload-dependent metric. Some workloads naturally use more CPU resources. Others use disk-heavy features such as quorum queues, and if disk I/O throughput is insufficient, CPU resources will be wasted while nodes are busy waiting for I/O operations to complete. Nodes will very rarely use the same amount of resources. Some will inevitably have more connections than others, even if just a bit more, some queues and streams are busier than others, and so on.

How to reduce the CPU footprint on RabbitMQ idle nodes

2. To reduce the monitoring footprint, reduce the frequency of monitoring and make sure that the monitoring tool only queries for the data it needs. The recommended metric collection interval for production is 30 seconds, or another suitable value in the 30 to 60-second range. Prometheus exporter API is designed to be scraped every 15 to 30 seconds, including production systems.

Frequency of Monitoring

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No