- There is no resolution for the issue at the moment as this depends on the
*RabbitMQ
* cluster resilience.
- Workaround:
To work around the issue, Reset the RabbitMQ cluster:
- SSH login to one of the nodes in the vRA cluster.
- Check the rabbitmq-ha pods status:
root@vra-node [ ~ ]#
kubectl -n prelude get pods --selector=app=rabbitmq-ha
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 1/1 Running 0 3d16h
rabbitmq-ha-1 1/1 Running 0 3d16h
rabbitmq-ha-2 1/1 Running 0 3d16h
- If all rabbitmq-ha pods are healthy, check the RabbitMQ cluster status for each of them:
seq 0 2 | xargs -n 1 -I {} kubectl exec -n prelude rabbitmq-ha-{} -- bash -c "rabbitmqctl cluster_status"
NOTE: Analyze the command output for each RabbitMQ node and verify that the "running_nodes" list contains all cluster members from the "nodes > disc" list::
[{nodes,
[{disc,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local','rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local']}]},
{running_nodes,
['rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local']}
...]
- If the "running_nodes" list doesn't contain all rabbitMQ cluster members, RabbitMQ is in de-clustered state and needs to be manually reconfigured. For example:
[{nodes,
[{disc,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local',
'rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local']}]},
{running_nodes,
['rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local']}
..]
To reconfigure the RabbitMQ cluster, complete the steps below:
- SSH login to one of the vRA nodes
- Reconfigure the RabbitMQ cluster: "vracli reset rabbitmq"
root@vra_node [ ~ ]# vracli reset rabbitmq
'reset rabbitmq' is a destructive command. Type 'yes' if you want to continue, or 'no' to stop: yes
- Wait until all rabbitmq-ha pods are re-created and healthy: "kubectl -n prelude get pods --selector=app=rabbitmq-ha"
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 1/1 Running 0 9m53s
rabbitmq-ha-1 1/1 Running 0 9m35s
rabbitmq-ha-2 1/1 Running 0 9m14s
- Delete the ebs pods: "kubectl -n prelude delete pods --selector=app=ebs-app".
- Wait until all ebs pods are re-created and ready: "kubectl -n prelude get pods --selector=app=ebs-app".
NAME READY STATUS RESTARTS AGE
ebs-app-84dd59f4f4-jvbsf 1/1 Running 0 2m55s
ebs-app-84dd59f4f4-khv75 1/1 Running 0 2m55s
ebs-app-84dd59f4f4-xthfs 1/1 Running 0 2m55s
- The RabbitMQ cluster is reconfigured. Request a new Deployment to verify that it completes successfully.