User set disk space usage alarm, if the alarm gets triggered, you would see below warning messages in the node logs:
[warning] <0.643.0> *** Publishers will be blocked until this alarm clears ***
[warning] <0.643.0> **********************************************************
[warning] <0.643.0> [0m
[warning] <0.643.0> disk resource limit alarm set on node '<nodename>'.
[warning] <0.643.0> [0m
[warning] <0.643.0> **********************************************************
Above scenario may cause production issue, as once the disk gets full as alarm says, publisher will be blocked until the alarm clears.
https://www.rabbitmq.com/docs/alarms
Usually it is caused by the segment files generated by Quorum Queue operations fills up the disk space.
https://www.rabbitmq.com/docs/quorum-queues#resource-use
It is due to publishers published messages too fast, and consumers could not consume them all on time.
It is considered as a environment design issue - disk space was too small to handle the scale of the production or publisher publishes way too faster than consumer, loads are not balanced.
It is considered as non-recoverable issue. What customer can do to recover,The file are written by the QQ, and cannot identify what was written to the file what was not:
- Try rabbitmqctl delete_queue, if customer can identify which queue is causing this. After the issue is recovered, redeclare the queue.
- Replace the affected node. Reference: https://www.rabbitmq.com/docs/quorum-queues#replica-management
Customer may ask questions, may follow below:
1. Why purge does not work?
The warning and alarm is disk space related. Purse to a queue is not equivalent to "rm -rf" - it won't remove everything in the DISK.
2. Can we delete the files manually?
It is not recommended.
3. How to prevent from happening?
Closely monitor the cluster and disk usage. Assign efficient disk space.