You check rabbitmq pods status and noticed that some pods are not running state
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rmq-server-0 0/1 Running 4 (43s ago) 45m 10.131.3.198 worker4.domain.com <none> <none> rmq-server-1 0/1 Init:0/1 0 45m <none> worker2.domain.com <none> <none> rmq-server-2 0/1 Init:0/1 0 45m <none> worker0.domain.com <none> <none>
Checking into the running pod logs
2023-02-17 01:59:46.628071+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 6 retries left[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> supervisor: {local,inet_tcp_compress_dist_conn_sup}[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> errorContext: child_terminated[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> reason: setup_timer_timeout[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> offender: [{pid,<0.625.0>},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {id,{undefined,false,#Ref<0.230854802.2669936641.148603>}},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {mfargs,[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {inet_tcp_compress_dist,dist_proc_start_link,undefined}},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {restart_type,temporary},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {significant,false},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {shutdown,5000},[0m [38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0> {child_type,worker}][0m
check the pods events
kubectl describe rmq-server-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 32m default-scheduler Successfully assigned default/wordpress-6c6794cb7d-cdnsc to tcp-md-0-7f67dbbfb8-lthnt Warning FailedAttachVolume 32m attachdetach-controller Multi-Attach error for volume "###-#####-####-####-###-#######" Volume is already exclusively attached to one node and can't be attached to another
Copying from our docs :
In order for RabbitMQ nodes to retain data between Pod restarts, node's data directory must use durable storage. A Persistent Volume must be attached to each RabbitMQ Pod.
The pods are unable to run successfully due to failing volumes
Resolving this issue would depend on your kubernetes platform. You need to coordinate with your kubernetes platform docs/support as this is out of scope for Rabbitmq for Kubernetes. Here are sample troubleshooting steps for various Kubernetes platform
If you are using VMWare's TKG/TKGi please refer to this KB
https://knowledge.broadcom.com/external/article/327470
If you are using AWS
https://portworx.com/blog/warning-failedattachvolume-warning-failedmount-kubernetes-aws-ebs/
If you are using OpenShift
https://docs.openshift.com/container-platform/4.8/support/troubleshooting/troubleshooting-installations.html