rabbitmq-server job fails with previous_upgrade_failed error and is unable to join the cluster
search cancel

rabbitmq-server job fails with previous_upgrade_failed error and is unable to join the cluster

book

Article ID: 293240

calendar_today

Updated On:

Products

VMware RabbitMQ

Issue/Introduction

A 3-node RabbitMQ cluster deployed through RabbitMQ for Tanzu VM on-demand service instance has a BOSH instance losing its persistent disk. However after the persistent disk reference is removed from BOSH and the instance is recreated with a new persistent disk, rabbitmq-server job on the new instance still fails to start and the following errors are observed in logs.
2023-04-18 08:53:07.181272+00:00 [error] <0.223.0> BOOT FAILED
2023-04-18 08:53:07.181272+00:00 [error] <0.223.0> ===========
2023-04-18 08:53:07.181272+00:00 [error] <0.223.0> Error during startup: {error,previous_upgrade_failed}
2023-04-18 08:53:07.181272+00:00 [error] <0.223.0>
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>   crasher:
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     initial call: application_master:init/4
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     pid: <0.222.0>
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     registered_name: []
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     exception exit: {previous_upgrade_failed,{rabbit,start,[normal,[]]}}
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>       in function  application_master:init/4 (application_master.erl, line 142)
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     ancestors: [<0.221.0>]
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     message_queue_len: 1
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     messages: [{'EXIT',<0.223.0>,normal}]
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     links: [<0.221.0>,<0.44.0>]
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     dictionary: []
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     trap_exit: true
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     status: running
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     heap_size: 233
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     stack_size: 28
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>     reductions: 160
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>   neighbours:
2023-04-18 08:53:08.182577+00:00 [error] <0.222.0>
2023-04-18 08:53:08.182940+00:00 [notice] <0.44.0> Application rabbit exited with reason: {previous_upgrade_failed,{rabbit,start,[normal,[]]}}
2023-04-18 08:53:17.506658+00:00 [notice] <0.223.0> Logging: configured log handlers are now ACTIVE
2023-04-18 08:53:19.192713+00:00 [error] <0.223.0> Found lock file at /var/vcap/store/rabbitmq/mnesia/db/schema_upgrade_lock.
2023-04-18 08:53:19.192713+00:00 [error] <0.223.0>             Either previous upgrade is in progress or has failed.
2023-04-18 08:53:19.192713+00:00 [error] <0.223.0>             Database backup path: /var/vcap/store/rabbitmq/mnesia/db-upgrade-backup

Neither upgrade-all-service-instances errand nor cf upgrade-service command could fix the issue. So the solution is to remove the failing node from the cluster and manually add it back to the cluster.

Environment

Product Version: 2.1

Resolution

The following steps can be followed to delete the node from cluster and added it back.
  1. On any running node run rabbitmqctl forget_cluster_node <failing node name>
  2. On failing node run rm -rf /var/vcap/store/rabbitmq/mnesia/*. This will remove the contents of the mnesia db on the failing node. Once the failing node is added back to the cluster it should sync with other running nodes on the cluster
  3. Run bosh recreate <failing node> to get a working node without joining the cluster. monit summary should all jobs running
  4. On failing node
  • monit unmonitor rabbitmq-server
  • rabbitmqctl stop_app
  • rabbitmqctl reset
  • rabbitmqctl join_cluster <any of running node name>
  • update file /var/vcap/store/rabbitmq/mnesia/rabbit@xxxx.bosh-feature_flags to make content same as that of running nodes
  • rabbitmqctl start_app
  • monit monitor rabbitmq-server