In distributed systems where RabbitMQ acts as a core messaging backbone, monitoring the health of individual nodes and specific queues is essential for ensuring message reliability, throughput, and high availability. Failures or performance degradation in any node or queue can lead to message loss, processing delays, or service disruptions—especially in classic queue.
This guide addresses how to assess the operational health of RabbitMQ nodes and inspect the state of queues using built-in RabbitMQ tooling. It is intended for engineers who need a systematic approach to proactively detect issues, validate resilience under load, and respond to cluster anomalies in real-time or during incident triage.
1. Health of a RabbitMQ node
The rabbitmq-diagnostics is a command-line tool bundled with RabbitMQ that provides diagnostic, monitoring, and health check capabilities for RabbitMQ nodes and clusters. It’s especially useful for inspecting node status, verifying cluster health, and running targeted checks during troubleshooting or routine maintenance.
We can run commands like "rabbitmq-diagnostics check_running" to make sure that the runtime is running and the RabbitMQ application on it is not stopped or paused.
Although the probability of false positives is low, systems hovering around their high runtime memory watermark will have a high probability of false positives, especially during upgrades and maintenance windows the probablity can raise significantly.
Here is an example,
rabbitmq-server/c3d2####890c:~$ sudo rabbitmq-diagnostics check_running
Checking if RabbitMQ is running on node rabbit@c3d2####890c.rabbitmq-server.infra.service-instance-47ed####9508.bosh ...
RabbitMQ on node rabbit@c3d2####6890c.rabbitmq-server.infra.service-instance-47ed####9508.bosh is fully booted and running
2.) Health of queues
The rabbitmqctl is the primary command-line tool used to manage a RabbitMQ server node. It allows administrators to perform a wide range of tasks such as:
Inspecting node status (rabbitmqctl status)
Managing users and permissions
Controlling queues, exchanges, and bindings
Joining or removing nodes from a cluster
Resetting or stopping the RabbitMQ application
It communicates with the RabbitMQ node over a dedicated CLI port and authenticates using a shared secret known as the Erlang cookie. There is an option "list_unresponsive_queues", which identifies queues that are currently unavailable or unresponsive on a RabbitMQ node.
We can create a script to iterate through all vhosts.
for vhost in $(sudo rabbitmqctl list_vhosts --quiet); do
echo "Checking unresponsive queues in vhost: $vhost"
sudo rabbitmq-diagnostics list_unresponsive_queues -p "$vhost" --timeout 5
done
Here is an example:
rabbitmq-server/c3d2####890c:~$ for vhost in $(sudo rabbitmqctl list_vhosts --quiet); do
echo "Checking unresponsive queues in vhost: $vhost"
sudo rabbitmq-diagnostics list_unresponsive_queues -p "$vhost" --timeout 5
done
Checking unresponsive queues in vhost: name
Listing unresponsive queues for vhost name ...
Checking unresponsive queues in vhost: 47ed####9508
Listing unresponsive queues for vhost 47eb####9508 ...