RabbitMQ: Files Under Folder "*_internal_osr_messages" Are Getting Piled Up in Warm Standby Replication

search cancel

RabbitMQ: Files Under Folder "*_internal_osr_messages" Are Getting Piled Up in Warm Standby Replication

book

Article ID: 381068

calendar_today

Updated On:

Products

Pivotal RabbitMQ VMware RabbitMQ VMware Tanzu RabbitMQ Support Only for OpenSource RabbitMQ

Issue/Introduction

Files under "*_internal_osr_messages" are piled up and not being deleted automatically.
These files are .idex and .segment files.

Environment

RabbitMQ 4.0.x and 3.13.x.

Cause

The issue was identified as a bug and fixed in the later version.

Resolution

Now, without K8S, users can set up Warm Standby Replication as well.
Please look into documents depending on the product lines running on different platform.

Troubleshooting:

1. Please verify if it was the configuration issue first.

The files are for the internal osr queues which are created following enablement of WSR. These are the stream which accumulates replicated messages to be republished upon standby promotion.
Usually, users need to have retention policy configured to keep enough data replicated without running out of disks.

The configuration parameters on RabbitMQ are:

# total size message stream limit in bytes # 5 GB - adjust if needed standby.replication.retention.size_limit.messages = 5000000000
# Retention based on time
standby.replication.retention.time_limit.messages = 12h

If you have env collected bundle RabbitMQ on Kubernetes, you would be above configurations under folder:
/<logbundle>/rabbitmq/etc/conf.d/##-userDefinedConfiguration.conf

Also, int the - rabbitmqctl_report:
{retention,
[{size_limit,[{messages,5000000000}]},
{time_limit,[{messages,"12h"}]}]}]},

If user did not apply any configuration, please add above configuration and restart the cluster.

And watch out if the issue resolves.

2. If the configuration was applied already, most like the retention is not taking effects. Can try to recover by manually deleting the files.

The files can be manually removed. If user have decades GB already customer can perform below actions.

Delete old files, do not delete all of the files, leave some of the latest files depending on the configuration like 5GB etc,.
Restart the stream, command line: rabbitmq-streams restart_stream ...
Finally, set log level to debug (default is info) to see if the retention takes place. If it does, you would see logs similar to below:
[osiris_log:evaluate_retention/2] ([{max_bytes,50000000000}]) completed in 5.501000ms

3. Watch out the cluster and collect logs and identify if the retention kick in anytime from the logs at all.

Please note that, once the debug level is set, the logs will flush really quickly. So make sure you are looking at more time range than like 1 minutes etc.
Please try to collect logs as more as possible to evaluate the issue.

If you are not able to see anything like above ample log entry, most like the cluster is impacted by the bug.

Current Resolution:

Long term fix is applied on later version, please upgrade to latest version of RabbitMQ and try again.

Work around is, manually remove the old files from the downstream cluster node, before we have the release GA with fixes.

Feedback

thumb_up Yes

thumb_down No