Users may encounter issues where their RabbitMQ cluster is not functioning normally due to a significant increase in memory utilization. This can cause the entire cluster to become unresponsive, resulting in failures in applications connected to any node in the cluster. The impacted node is not immediately marked as out of service, and the cluster does not recover until the affected node is restarted.
All RabbitMQ versions
This issue is typically caused by an overload of memory usage in one or more RabbitMQ nodes. As RabbitMQ nodes consume varying amounts of memory and disk space based on the workload, usage spikes can cause both memory and disk space to reach critical levels.
When memory usage spikes to a dangerous level, the node can be terminated by the operating system’s "OOM killer" (Out of Memory killer)
1. Monitor Memory and Disk Usage:
- Track memory and disk usage across RabbitMQ nodes to ensure that they are not reaching critical levels.
2. Configure vm_memory_high_watermark:
- Adjust the vm_memory_high_watermark setting in RabbitMQ to avoid memory-related issues. This setting controls the amount of memory RabbitMQ is allowed to use before it starts blocking publishers.
- Lowering the watermark slightly can help prevent nodes from running out of memory.
vm_memory_high_watermark = <suitable_value_as_per_environment>