Cluster join operation fails at "Finishing rabbitmq-server" step
search cancel

Cluster join operation fails at "Finishing rabbitmq-server" step

book

Article ID: 321117

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • The cluster join procedure is stuck at the step "Finishing rabbitmq-server" for more than 45 minutes.
  • Running the "rabbitmqctl cluster_status" command returns an output similar to:
Error: unable to connect to nodes [rabbit@company]: nodedown

DIAGNOSTICS
===========
attempted to contact: [rabbit@company]

rabbit@company:
  * unable to connect to epmd (port 4369) on company: nxdomain (non-existing domain)

current node details:
- node name: 'rabbitmq-cli-1@node'
- home dir: /var/lib/rabbitmq
- cookie hash: 4+kP1tKnxGYaGjrPL2C8bQ==



Environment

VMware vRealize Automation 7.x

Cause

This issue occurs due to a known issue with the RabbitMQ service. RabbitMQ uses short host names for vRealize Automation appliances by default, which might prevent nodes from resolving one another.

For more information see VMware Docs.

Resolution

This is a known issue affecting VMware vRealize Automation 7.5.x.

Currently, there is no resolution.

Workaround:
For all vRealize Automation appliances in the deployment, log in as root to a console session.

1. Stop the RabbitMQ service.
  service rabbitmq-server stop

2. Open the following file in a text editor.
  /etc/rabbitmq/rabbitmq-env.conf

3. Set the following property to true.
  USE_LONGNAME=true

4. Save and close the rabbitmq-env.conf.

5. Reset RabbitMQ.
  vcac-vami rabbitmq-cluster-config reset-rabbitmq-node

6. On just one vRealize Automation appliance node, run the following script.
  vcac-config cluster-config-ping-nodes --services rabbitmq-server

7. On all nodes, verify that the RabbitMQ service is started.
  vcac-vami rabbitmq-cluster-config get-rabbitmq-status