One Aria Automation node from a 3 node cluster is down/unavailable and Provisioning is not functioning
book
Article ID: 377795
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
Symptoms:
one Aria Automation node is down / unavailable due to Infrastructure issues
Aria Automation portal is accessible
VM provisioning is taking a long time and eventually failing with errors about Event topics e.g.: "Failed to publish event to topic: Deployment requested"
reviewing Aria Automation services using command "kubectl -n prelude get pods -o wide" only 1 pods from one node are down
reviewing RabbitMQ status using below command, only 1 node shows as active node (Ref: Resolve RabbitMQ cluster issues in vRA 8.x deployment) seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
Environment
Aria Automation 8.x
Cause
Due to RabbitMQ isolation, another RabbitMQ service was stopped, therefor the last working RabbitMQ service stopped handling any messages.
Resolution
Before proceeding please take a Snapshot, including Memory, of the 2 available nodes from vCenter.
Identify current running RabbitMQ nodes: kubectl -n prelude get pods -o wide | grep -Ei "name|rabbitmq"
Identify which pods are currently running the RabbitMQ application: seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
E.g.:
Only "rabbitmq-ha-0" is active, depending to which node is down "rabbitmq-ha-1" or "rabbitmq-ha-2" the opposite has to be started
Try to start the RabbitMQ application on the node which is available but not listed as "Running Nodes" from above command.
E.g.: Node 3 of the Aria Automation cluster has the outage, Node 2 is running but RabbitMQ not reporting kubectl exec -n prelude rabbitmq-ha-1 -- bash -c "rabbitmqctl start_app"
Validate that now 2 nodes reporting as running using the same command as Step 2: seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
Validate provisioning is now proceeding by creating a new Request in Aria Automation portal