Symptoms:
kubectl -n prelude get pods -o wide
" only 1 pods from one node are downseq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
Aria Automation 8.x
A cluster instability may be cause if one of the Aria Automation nodes went down, which may lead to issue in the Messaging Queue, RabbitMQ.
Due to RabbitMQ isolation, another RabbitMQ service was stopped, therefor the last working RabbitMQ service stopped handling any messages.
Restore the Aria Automation node, if Linux booted into Emergency console then please review this article:
"Failed to start file system check on /dev/disk..." error on Photon OS based virtual appliances
To workaround the issue while the node is not working:
Before proceeding please take a Snapshot, including Memory, of the 2 available nodes from vCenter.
kubectl -n prelude get pods -o wide | grep -Ei "name|rabbitmq"
seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
kubectl exec -n prelude rabbitmq-ha-1 -- bash -c "rabbitmqctl start_app"
seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"