Resolve RabbitMQ cluster issues in vRA 8.x deployment
search cancel

Resolve RabbitMQ cluster issues in vRA 8.x deployment

book

Article ID: 319575

calendar_today

Updated On: 01-29-2025

Products

VMware Aria Suite

Issue/Introduction

  •  Below symptoms are noticed: 
    • Failed to publish event to topic: Deployment resource action requested
    • Failed to publish event to topic: Deployment requested
    • "Failed to publish event to topic: Deployment resource action requested" or requests do not proceed past the
    • Deployment requests are stuck in different life-cycle states for a long time until a time-out is reached.
    • All deployment requests start failing and restart of node(s) is necessary to bring environment back. 
    • Alert every 10-14 days from VROPS: Description: Aria Automation is Down. Object Name: ebs
    •   Below are log details from EBS app-server Logs:-

       The mapper [reactor.rabbitmq.Receiver$ChannelCreationFunction] returned a null value.

      computing metrics in newChannel: null
      2023-11-01T09:14:03.038Z DEBUG event-broker [host='ebs-app-5c66ffc6df-cmtjb' thread='main-pool-35' user='' org=''                    trace='1XXXXX8-9XXX6-4XXa-bXX5-               aXXXXXXXXXXXc' request-trace='']                                                        c.v.a.e.b.s.EventBrokerConfiguration.lambda$initialize$0:123 -               Operator Error:                                          (NullPointerException) The mapper [reactor.rabbitmq.Receiver$ChannelCreationFunction] returned a null value.
         java.lang.NullPointerException: The mapper [reactor.rabbitmq.Receiver$ChannelCreationFunction] returned a null value.
         at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:115)
         at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)
         at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)
         at io.opentracing.contrib.reactor.TracedSubscrib

     



Environment

  • VMware Aria Automation 8.x

Cause

  • Suspending vRA node or network partitioning between the vRA nodes cluster deployments, will result in connectivity issues between the RabbitMQ cluster members, which could lead to de-clustered RabbitMQ.

Resolution

  • There is no resolution for the issue at the moment as this depends on the *RabbitMQ* cluster resilience.
  • Workaround:

To work around the issue, Reset the RabbitMQ cluster:

  1. SSH login to one of the nodes in the vRA cluster.
  2. Check the rabbitmq-ha pods status:
root@vra-node [ ~ ]#
kubectl -n prelude get pods --selector=app=rabbitmq-ha
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 1/1 Running 0 3d16h
rabbitmq-ha-1 1/1 Running 0 3d16h
rabbitmq-ha-2 1/1 Running 0 3d16h
  1. If all rabbitmq-ha pods are healthy, check the RabbitMQ cluster status for each of them:
seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"

NOTE: Analyze the command output for each RabbitMQ node and verify that the "running_nodes" list contains all cluster members from the "nodes > disc" list::
[{nodes,
[{disc,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local','rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local']}]},
{running_nodes,
['rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local']}
...]
  1. If the "running_nodes" list doesn't contain all rabbitMQ cluster members, RabbitMQ is in de-clustered state and needs to be manually reconfigured. For example:
[{nodes,
 [{disc,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local']}]},
{running_nodes,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local']}
..]

To reconfigure the RabbitMQ cluster, complete the steps below:
  1. SSH login to one of the vRA nodes
  2. Reconfigure the RabbitMQ cluster: "vracli reset rabbitmq"
root@vra_node [ ~ ]# vracli reset rabbitmq
'reset rabbitmq' is a destructive command. Type 'yes' if you want to continue, or 'no' to stop: yes
  1. Wait until all rabbitmq-ha pods are re-created and healthy: "kubectl -n prelude get pods --selector=app=rabbitmq-ha"
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 1/1 Running 0 9m53s
rabbitmq-ha-1 1/1 Running 0 9m35s
rabbitmq-ha-2 1/1 Running 0 9m14s
  1. Delete the ebs pods: "kubectl -n prelude delete pods --selector=app=ebs-app".
  2. Wait until all ebs pods are re-created and ready: "kubectl -n prelude get pods --selector=app=ebs-app".
NAME READY STATUS RESTARTS AGE
ebs-app-84dd59f4f4-jvbsf 1/1 Running 0 2m55s
ebs-app-84dd59f4f4-khv75 1/1 Running 0 2m55s
ebs-app-84dd59f4f4-xthfs 1/1 Running 0 2m55s
  1. The RabbitMQ cluster is reconfigured. Request a new Deployment to verify that it completes successfully.