Files under "*_internal_osr_messages" are piled up and not being deleted automatically.
These files are .idex and .segment files.
RabbitMQ 4.0.x and 3.13.x.
The issue was identified as a bug in the latest version and will be fixed upcoming release, but no ETA for this now.
Now, without K8S, users can set up Warm Standby Replication as well.
Please look into documents depending on the product lines running on different platform.
Troubleshooting:
1. Please verify if it was the configuration issue first.
The files are for the internal osr queues which are created following enablement of WSR. These are the stream which accumulates replicated messages to be republished upon standby promotion.
Usually, users need to have retention policy configured to keep enough data replicated without running out of disks.
The configuration parameters on RabbitMQ are:
If you have env collected bundle RabbitMQ on Kubernetes, you would be above configurations under folder:
/<logbundle>/rabbitmq/etc/conf.d/##-userDefinedConfiguration.conf
Also, int the - rabbitmqctl_report:
{retention,
[{size_limit,[{messages,5000000000
}]},
{time_limit,[{messages,"12h"}]}]}]},
If user did not apply any configuration, please add above configuration and restart the cluster.
And watch out if the issue resolves.
2. If the configuration was applied already, most like the retention is not taking effects. Can try to recover by manually deleting the files.
The files can be manually removed. If user have decades GB already customer can perform below actions.
rabbitmq-streams restart_stream ...
[osiris_log:evaluate_retention/2] ([{max_bytes,50000000000}]) completed in 5.501000ms
3. Watch out the cluster and collect logs and identify if the retention kick in anytime from the logs at all.
Please note that, once the debug level is set, the logs will flush really quickly. So make sure you are looking at more time range than like 1 minutes etc.
Please try to collect logs as more as possible to evaluate the issue.
If you are not able to see anything like above ample log entry, most like the cluster is impacted by the bug.
Current Resolution:
Long term fix will be on the upcoming version, but now we don't have exact ETA for this.
Work around is, manually remove the old files from the downstream cluster node, before we have the release GA with fixes.