Troubleshooting steps to perform when a RabbitMQ node doesn't join a cluster
search cancel

Troubleshooting steps to perform when a RabbitMQ node doesn't join a cluster

book

Article ID: 297322

calendar_today

Updated On:

Products

Support Only for OpenSource RabbitMQ

Resolution

The purpose of this article is to provide a checklist of troubleshooting steps to perform prior to opening a support ticket, when you find that a RabbitMQ node doesn't join a cluster. Before you begin troubleshooting, enable debug logging in the cluster.  For each step below, capture the output of the cli command, and the corresponding logs prior to opening a support ticket.

1. If you see a log entry as shown below with an 'Invalid challenge reply', this is an indication of mismatched erlang cookies between the nodes.

<0.3490.9> ** Connection attempt from node ‘xxxxxxx’ rejected. Invalid challenge reply. **

To confirm that the Erlang cookies are the same on all nodes, inspect the contents of the erlang cookie file at /var/lib/rabbitmq/.erlang.cookie. If this is different in the node that you are trying to connect to, then change it to ensure that they match. Even if the contents are the same, run the command rabbitmqctl eval 'erlang:get_cookie()’ or rabbitmq-diagnostics erlang_cookie_hash -q to account for the environment variable RABBITMQ_ERLANG_COOKIE that can override the file.

2. Test hostname resolution via the  rabbitmq-diagnostics resolve_hostname command from the node that you want to join to the cluster.

3. Ensure the node name is in the same format as the rest of the nodes by verifying the value of RABBITMQ_USE_LONGNAME.

4. Enable feature flags using rabbitmqctl enable_feature_flag all.

5. Run the rabbitmqctl join_cluster command from the new node to a node of the existing cluster. Refer to https://www.rabbitmq.com/rabbitmqctl.8.html#join_cluster for additional instructions.

If none of these steps work, you can proceed to reset the cluster and/or force the node to forget any previous cluster state. Please review the links below for more information before proceeding.

https://www.rabbitmq.com/rabbitmqctl.8.html#forget_cluster_node

https://www.rabbitmq.com/clustering.html#resetting-nodes


6. Reset the new node, followed by a join_cluster.
7. Run forget_cluster_node to force the removal of this node from the cluster, followed by a join_cluster.