RabbitMQ queues in a "down" state lead to high memory consumption
search cancel

RabbitMQ queues in a "down" state lead to high memory consumption

book

Article ID: 293214

calendar_today

Updated On:

Products

VMware RabbitMQ

Issue/Introduction

RabbitMQ Smoke Test failed during an upgrade. 

• Failure [71.625 seconds]
           Smoke tests
           /var/vcap/packages/cf-rabbitmq-smoke-tests/src/rabbitmq-smoke-tests/tests/smoke_tests_test.go:15
             pushes an app, sends, and reads a message from RabbitMQ: plan 'standard' [It]
             /var/vcap/packages/cf-rabbitmq-smoke-tests/src/rabbitmq-smoke-tests/tests/smoke_tests_test.go:82

             Expected
                 <int>: 500
             to be <
                 <int>: 300

 

We checked the health status of the RMQ cluster by logging into the dashboard. The first node was over 7 GB memory, substantially above the high water mark. Checking the queues, we found 8 that were in a down state, labeled "NaN (not a number)".


Environment

Product Version: 2.0

Resolution

The 8 queues in NaN state were removed with the following rabbitmqctl eval command:

rabbitmqctl eval 'Q = rabbit_misc:r(<<"/">>, queue, <<"queue-name">>), rabbit_amqqueue:internal_delete(Q, <<"cli">>).'


Note: By running the rabbitmqctl eval command, you are removing all messages from those queues. This will result in lost messages.

We needed to use this command as they were unable to be deleted via the RabbitMQ Management UI. There are times when you can delete queues in Nan state via the management UI.

The apps which use the queues would automatically recreate them as needed. Then we restarted RabbitMQ.

rabbitmqclt stop_app
rabbitmqclt start_app


This flushed the excess memory in use on the primary node.

After performing these steps, you can re-run the operations which had failed due to RabbitMQ being down. 

ALWAYS check the health of your RabbitMQ cluster before performing maintenance or upgrade tasks.