RabbitMQ Cluster Memory Utilization Issue and Node Recovery

search cancel

RabbitMQ Cluster Memory Utilization Issue and Node Recovery

book

Article ID: 402440

calendar_today

Updated On:

Products

RabbitMQ Support Only for OpenSource RabbitMQ VMware Tanzu RabbitMQ VMware vFabric RabbitMQ 2.x

Issue/Introduction

Users may encounter issues where their RabbitMQ cluster is not functioning normally due to a significant increase in memory utilization. This can cause the entire cluster to become unresponsive, resulting in failures in applications connected to any node in the cluster. The impacted node is not immediately marked as out of service, and the cluster does not recover until the affected node is restarted.

Environment

All RabbitMQ versions

Cause

This issue is typically caused by an overload of memory usage in one or more RabbitMQ nodes. As RabbitMQ nodes consume varying amounts of memory and disk space based on the workload, usage spikes can cause both memory and disk space to reach critical levels.
When memory usage spikes to a dangerous level, the node can be terminated by the operating system’s "OOM killer" (Out of Memory killer)

Resolution

1. Monitor Memory and Disk Usage:

- Track memory and disk usage across RabbitMQ nodes to ensure that they are not reaching critical levels.

2. Configure vm_memory_high_watermark:

- Adjust the vm_memory_high_watermark setting in RabbitMQ to avoid memory-related issues. This setting controls the amount of memory RabbitMQ is allowed to use before it starts blocking publishers.

- Lowering the watermark slightly can help prevent nodes from running out of memory.

vm_memory_high_watermark = <suitable_value_as_per_environment>

Additional Information

Reference documents:

https://www.rabbitmq.com/docs/memory.html

https://www.rabbitmq.com/docs/alarms

Feedback

thumb_up Yes

thumb_down No