Rabbitmq for Kubernetes fails to load Mnesia db due to pvc issues
search cancel

Rabbitmq for Kubernetes fails to load Mnesia db due to pvc issues

book

Article ID: 293152

calendar_today

Updated On:

Products

VMware RabbitMQ

Issue/Introduction


You check rabbitmq pods  status and noticed that some pods are not running state
 

NAME                  READY   STATUS     RESTARTS      AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
rmq-server-0   0/1     Running    4 (43s ago)   45m   10.131.3.198   worker4.domain.com   <none>           <none>
rmq-server-1   0/1     Init:0/1   0             45m   <none>         worker2.domain.com   <none>           <none>
rmq-server-2   0/1     Init:0/1   0             45m   <none>         worker0.domain.com   <none>           <none>

Checking into the running pod logs
 

2023-02-17 01:59:46.628071+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 6 retries left[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     supervisor: {local,inet_tcp_compress_dist_conn_sup}[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     errorContext: child_terminated[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     reason: setup_timer_timeout[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     offender: [{pid,<0.625.0>},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {id,{undefined,false,#Ref<0.230854802.2669936641.148603>}},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {mfargs,[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                    {inet_tcp_compress_dist,dist_proc_start_link,undefined}},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {restart_type,temporary},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {significant,false},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {shutdown,5000},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {child_type,worker}][0m


check the pods events

kubectl describe rmq-server-0

Events:
  Type     Reason              Age                  From                                Message
  ----     ------              ----                 ----                                -------
  Normal   Scheduled           32m                  default-scheduler                   Successfully assigned default/wordpress-6c6794cb7d-cdnsc to tcp-md-0-7f67dbbfb8-lthnt
  Warning  FailedAttachVolume  32m                  attachdetach-controller             Multi-Attach error for volume "###-#####-####-####-###-#######" Volume is already exclusively attached to one node and can't be attached to another


Copying from our docs :
 

In order for RabbitMQ nodes to retain data between Pod restarts, node's data directory must use durable storage. A Persistent Volume must be attached to each RabbitMQ Pod.


The pods are unable to run successfully due to failing volumes








Resolution

Resolving this issue would depend on your kubernetes platform. You need to coordinate with your kubernetes platform docs/support as this is out of scope for Rabbitmq for Kubernetes. Here are sample troubleshooting steps for various Kubernetes platform

If you are using VMWare's TKG/TKGi please refer to this KB
https://knowledge.broadcom.com/external/article/327470
 
If you are using AWS
https://portworx.com/blog/warning-failedattachvolume-warning-failedmount-kubernetes-aws-ebs/

If you are using OpenShift
https://docs.openshift.com/container-platform/4.8/support/troubleshooting/troubleshooting-installations.html