Failed to start or stop RabbitMQ service
search cancel

Failed to start or stop RabbitMQ service

book

Article ID: 371316

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Integrated Components on one or multiple nodes of the vIDM cluster shows " There was a problem with Messaging service.Error retrieving rabbitmq status. 
Messasging Connection   ---  Connection test failed . and command " rabbitmqctl stop_app " failed . 

Environment

A clustered vIDM environment version 3.3.7 and you have already attempted KB  367757 to restore the RabbitMQ service however couldn't start or stop the Rabbitmq service.  

Cause

The Rabbitmq service is either stuck in terminating or starting state and couldn't succesfully execute " rabbitmqctl start_app  " or  " rabbitmqctl stop_app" to restore the issue with the messaging service.

Following are some of log snippets we see on the rabbitmq and horizon.log related to messaging/rabbitmq . 

Note : The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

#rabbitmq@<vidm-node>.log :
2024-06-25 02:19:52.738969+00:00 [info] <0.440.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2024-06-25 02:19:52.739338+00:00 [warning] <0.440.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": rebuilding indices from scratch
2024-06-25 02:19:52.742519+00:00 [error] <0.370.0> ** Generic server <0.370.0> terminating

 


Following are snippets related to RabbitMQ ----- # horizon.log: 
2024-06-25T16:39:59,069 ERROR (Calc-V2-Tenant-Update-Notifier[user-provisioning]) [;;;] com.vmware.horizon.messaging.provider.rabbitmq.MessagingUtils - Setting up messaging provider connection failed.
java.net.ConnectException: Connection refused (Connection refused)

Resolution

 1. nuke elasticsearch:  
 • service elasticsearch stop; 
 • rm -rf /db/elasticsearch/horizon on each node. 
 • Do not start elasticsearch yet, leave it stopped.


 2.nuke rabbitmq:  
 • Stop the horizon servive by running the command " service horizon-workspace stop "    
 • rabbitmqctl reset (did not work) and  also " rabbitmqctl stop_app"  failed and also  we were facing error stopping the service on one of the vIDM Node # because of the previous state of rabbitmq service stuck in either starting or stopping  state , to resolved this we find the PID on port 25672 {used by Rabbitmq for inter-node and CLI tools communication (Erlang distribution server port)} by executing the command  " sudo lsof -i :25672 " and then kill the PID  using command " sudo kill <PID> " and started the service by executing the command " /etc/init.d/rabbitmqctl start " post that we succesfully executed the " rabbitmqctl stop_app " .
 • rabbitmqctl force_reset
 • rabbitmqctl start_app
 • service horizon-workspace start" on each node 


Once we validated that the " horizon-workspace" is running on all the vIDM nodes we then started the opensearch service on all the node by executing command " service opensearch start " .


Validate if the vIDM cluster is healthy and there are no error related to the messaging service.  


Note: Before executing the above steps make sure we have taken snapshot for all the vIDM nodes in the cluster