Handling Network Partitions on Tanzu RabbitMQ for Tanzu Application Service
search cancel

Handling Network Partitions on Tanzu RabbitMQ for Tanzu Application Service

book

Article ID: 377256

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

If you suddenly can't create or bind to a RabbitMQ service instance, you might be running into network partitions. Search the RabbitMQ logs to see if you find this message -->

"** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, ....."

 

Environment

Tanzu RabbitMQ for Tanzu Application Service

Cause

Unstable network condition can cause network partitions

 

 

 

Resolution

if the problem didn't automatically get recovered through your network partition handling strategy setting, only remaining option is to manually restart the nodes following this guidance --https://www.rabbitmq.com/docs/partitions

There is a new Prometheus plugin gauge , rabbitmq_unreachable_cluster_peers_count, that indicates how many cluster peers cannot be reached by this node.
You can use this gauge to monitor if network partitions happened.

Additional Information

https://github.com/rabbitmq/rabbitmq-server/discussions/9497