Rabbitmq for Kubernetes fails to load Mnesia db due to pvc issues

Products

VMware RabbitMQ

Issue/Introduction

You check rabbitmq pods status and noticed that some pods are not running state

NAME                  READY   STATUS     RESTARTS      AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
rmq-server-0   0/1     Running    4 (43s ago)   45m   10.131.3.198   worker4.domain.com   <none>           <none>
rmq-server-1   0/1     Init:0/1   0             45m   <none>         worker2.domain.com   <none>           <none>
rmq-server-2   0/1     Init:0/1   0             45m   <none>         worker0.domain.com   <none>           <none>

Checking into the running pod logs

2023-02-17 01:59:46.628071+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 6 retries left[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     supervisor: {local,inet_tcp_compress_dist_conn_sup}[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     errorContext: child_terminated[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     reason: setup_timer_timeout[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>     offender: [{pid,<0.625.0>},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {id,{undefined,false,#Ref<0.230854802.2669936641.148603>}},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {mfargs,[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                    {inet_tcp_compress_dist,dist_proc_start_link,undefined}},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {restart_type,temporary},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {significant,false},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {shutdown,5000},[0m
[38;5;160m2023-02-17 01:59:57.454937+00:00 [error] <0.248.0>                {child_type,worker}][0m

check the pods events

kubectl describe rmq-server-0

Events:
  Type     Reason              Age                  From                                Message
  ----     ------              ----                 ----                                -------
  Normal   Scheduled           32m                  default-scheduler                   Successfully assigned default/wordpress-6c6794cb7d-cdnsc to tcp-md-0-7f67dbbfb8-lthnt
  Warning  FailedAttachVolume  32m                  attachdetach-controller             Multi-Attach error for volume "###-#####-####-####-###-#######" Volume is already exclusively attached to one node and can't be attached to another

Copying from our docs :

In order for RabbitMQ nodes to retain data between Pod restarts, node's data directory must use durable storage. A Persistent Volume must be attached to each RabbitMQ Pod.

The pods are unable to run successfully due to failing volumes

Resolution

Resolving this issue would depend on your kubernetes platform. You need to coordinate with your kubernetes platform docs/support as this is out of scope for Rabbitmq for Kubernetes. Here are sample troubleshooting steps for various Kubernetes platform

If you are using VMWare's TKG/TKGi please refer to this KB
https://knowledge.broadcom.com/external/article/327470

If you are using AWS
https://portworx.com/blog/warning-failedattachvolume-warning-failedmount-kubernetes-aws-ebs/

If you are using OpenShift
https://docs.openshift.com/container-platform/4.8/support/troubleshooting/troubleshooting-installations.html

Rabbitmq for Kubernetes fails to load Mnesia db due to pvc issues

Article ID: 293152

Updated On:

Products

Issue/Introduction

Resolution

Feedback