VIO Rabbitmq not starting could not write file
search cancel

VIO Rabbitmq not starting could not write file

book

Article ID: 321844

calendar_today

Updated On:

Products

VMware VMware Integrated OpenStack

Issue/Introduction

Symptoms:
  • Trying to start services and rabbitmq fails 
  • In the rabbitmq log, you see entries similar to:
root@photon-machine [ ~ ]# oslog rabbitmq1-rabbitmq-0
+ exec /docker-entrypoint.sh rabbitmq-server
2019-11-28 19:20:18.957 [info] <0.33.0> Application lager started on node
'rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local'
2019-11-28 19:20:18.983 [error] <0.5.0>
Error description:
    init:do_boot/3
    init:start_em/1
    rabbit:start_it/1 line 478
    rabbit:'-boot/0-fun-0-'/0 line 329
    rabbit_node_monitor:prepare_cluster_status_files/0 line 129
    rabbit_node_monitor:write_cluster_status/1 line 148
throw:{error,{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/cluster_nodes.config",
                                   enospc}}
Log file(s) (may contain more information):
   <stdout>

BOOT FAILED
===========

Error description:
    init:do_boot/3
    init:start_em/1
    rabbit:start_it/1 line 478
    rabbit:'-boot/0-fun-0-'/0 line 329
    rabbit_node_monitor:prepare_cluster_status_files/0 line 129
    rabbit_node_monitor:write_cluster_status/1 line 148
throw:{error,{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/cluster_nodes.config",
                                   enospc}}
Log file(s) (may contain more information):
   <stdout>

{"init terminating in
do_boot",{error,{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/cluster_nodes.config",enospc}}}
init terminating in do_boot
({error,{could_not_write_file,/var/lib/rabbitmq/mnesia/rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/cluster_nodes.config,enospc}})

 
Note:  The preceding log excerpts are only examples.  Date, time and environmental variables may vary depending on your environment.
 


Environment

VMware Integrated Openstack 7.x
VMware Integrated OpenStack 6.x

Cause

  • The root cause is the 20GB space allocated for rabbitmq has run out. Most of the space is used under
~/mnesia/rabbit@rabbitmq1-rabbitmq-0.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/msg_stores/vhosts/[xxxxxxxxx]/queues/[yyyyyyyy]/*.idx

Resolution

This is a known issue affecting VMware Integrated Openstack 6.0.

Workaround:
  1. Check disk usage of the rabbitmq pod:
osctl exec rabbitmq1-rabbitmq-0 df
 
Note: Pay attention to Use% of the following line:

/dev/sdc 20511312 50440 20444488 1% /var/lib/rabbitmq
  1. Open interactive TTY to the rabbitmq pod
osctl exec -it rabbitmq1-rabbitmq-0 /bin/bash
  1. Set rabbitmq TTL run the following commands:
for vhost in nova glance keystone neutron heat barbican cinder;do rabbitmqctl set_policy --vhost ${vhost} --priority 0 --apply-to all ha_ttl_${vhost} '(notifications)\.' '{"ha-mode":"all","ha-sync-mode":"automatic","message-ttl":70000}' ; done

rabbitmqctl set_policy TTL ".*" '{"message-ttl":70000}' --apply-to queues
  1. If the /dev/sdc partition is still full we need to clear out the *.idx files
rabbitmqctl list_queues
rabbitmqctl purge queue name=<name of queue>
  1. Restart rabbitmq if stopped
rabbitmqctl force_boot


Additional Information

If you cannot get into the rabbitmq pod before it restarts, follow these instructions:
  1. osctl edit statefulset rabbitmq1-rabbitmq
  2. change command to: (add 'sleep 3600')
      containers:
      - command:
        - bash
        - -c
        - |
          sleep 3600
          rabbitmqctl force_boot
          /tmp/rabbitmq-start.sh
  1. change livenessProbe to: (change initialDelaySeconds from 30 to 300)
        livenessProbe:
          exec:
            command:
            - /tmp/rabbitmq-liveness.sh
          failureThreshold: 3
          initialDelaySeconds: 300
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10
  1. Save your changes
  2. osctl delete po rabbitmq1-rabbitmq-2
  3. After new rabbitmq1-rabbitmq-2 pod comes up,
osctl exec -ti rabbitmq1-rabbitmq-2 bash
rm -rf /var/lib/rabbitmq/mnesia/rabbit@rabbitmq1-rabbitmq-2.rabbitmq1-dsv-59862a.openstack.svc.cluster.local/msg_stores/vhosts
exit
  1. Now use 'osctl edit statefulset rabbitmq1-rabbitmq' to revert the changes above.
  2. osctl delete po rabbitmq1-rabbitmq-2
Now the new rabbitmq1-rabbitmq-2 pod should come up and become 1/1 Running.