RabbitMQ stop_app Hangs After Mnesia Removal and Simultaneous Node Start
search cancel

RabbitMQ stop_app Hangs After Mnesia Removal and Simultaneous Node Start

book

Article ID: 391929

calendar_today

Updated On:

Products

VMware Tanzu RabbitMQ

Issue/Introduction

RabbitMQ's stop_app operation may hang after removing the Mnesia database due to a network partition event especially when attempting to restart multiple nodes simultaneously.

Environment

All RabbitmQ and Erlang versions

Cause

This issue is primarily caused by:

  1. Cluster state inconsistency: Removing the Mnesia database without proper cluster shutdown can lead to inconsistent cluster state information across nodes.
  2. Race conditions during the restart: Simultaneous restarts can result in conflicts during cluster reformation.
  3. Lingering effects of network partitions: Residual effects from the partition may disrupt inter-node communication.

Resolution

1. Force stop nodes:

   rabbitmqctl stop_app --timeout 60

2. Clean Mnesia directories on all nodes:

   sudo rm -rf /var/lib/rabbitmq/mnesia/*

3 . Restart nodes sequentially with delays between each start:

      Start one node at a time, waiting for it to initialize fully before starting the next.

Additional Information