RabbitMQ Upgrade Stuck During Post-Deploy While Enabling Feature Flags
search cancel

RabbitMQ Upgrade Stuck During Post-Deploy While Enabling Feature Flags

book

Article ID: 426128

calendar_today

Updated On:

Products

VMware Tanzu Platform Data VMware Tanzu Data Suite VMware Tanzu Data Suite VMware Tanzu Data Intelligence RabbitMQ Support Only for OpenSource RabbitMQ VMware Tanzu RabbitMQ

Issue/Introduction

RabbitMQ clusters upgraded from Tile 2.3.5 (RabbitMQ 3.12.14) to Tile 2.4.3 (RabbitMQ 3.13.11) became stuck during the post-deploy phase.
The upgrade consistently stalled while executing the command:

rabbitmqctl enable_feature_flag all

As a result, the “Apply Changes” operation in Ops Manager did not complete.

Environment

Upgrade version : Tile 2.3.5 (RabbitMQ 3.12.14) to Tile 2.4.3 (RabbitMQ 3.13.11)

Cause

During the post-deploy script, "rabbitmqctl enable_feature_flag all" runs on all cluster nodes. On affected nodes, logs shows that Feature flag enablement started. Some flags (e.g., quorum_queue_non_voters, message_containers) were enabled but the final log line Feature flags enabled never appeared. For example, 

18:2026-01-17 12:40:13.581412+00:00 [notice] <...> Feature flags: checking nodes `rabbit@1` and `rabbit@2` compatibility...
19:2026-01-17 12:40:13.584773+00:00 [notice] <...> Feature flags: nodes `rabbit@1` and `rabbit@2` are compatible
75:2026-01-17 12:40:14.474665+00:00 [info] <...> Running boot step feature_flags defined by app rabbit
31234:2026-01-17 12:43:13.983087+00:00 [notice] <0.830.0> Feature flags: attempt to enable `quorum_queue_non_voters`...
31235:2026-01-17 12:43:14.231609+00:00 [notice] <0.830.0> Feature flags: `quorum_queue_non_voters` enabled
31236:2026-01-17 12:43:14.306592+00:00 [notice] <0.830.0> Feature flags: attempt to enable `some feature flag`...
31237:2026-01-17 12:43:14.427828+00:00 [notice] <0.830.0> Feature flags: attempt to enable `message_containers`...
31238:2026-01-17 12:43:14.664699+00:00 [notice] <0.830.0> Feature flags: `message_containers` enabled

\Note how the following log line is missing:

Feature flags: `some feature flag` enabled

Cause of this is that the rabbit_ff_controller process became blocked while attempting to check feature flags synchronously during message publication.
This blocking occurred due to a global lock acquisition inside rabbit_feature_flags:is_enabled/1, triggered indirectly by logging via the RabbitMQ log exchange.

As a result:

  • rabbit_ff_controller became stuck

  • rabbit_event message queue grew and stalled

  • Feature flag enablement never completed

This is a known issue described in RabbitMQ discussion #11652.

Resolution

Workaround

The stuck upgrade was resolved by restarting all RabbitMQ nodes:

  1. Stop all nodes using: 

    monit stop
  2. Start all nodes using: 

    monit start
  3. Re-run Apply Changes in Ops Manager

After restart, the post-deploy script completed successfully and all feature flags were enabled.

Fix

The issue is fixed by updating RabbitMQ so that feature flag checks are performed in a non-blocking manner when publishing messages.

The fix is included in:

These changes prevent rabbit_ff_controller from blocking on feature flag checks during logging.