[upgrade-all-service-instances] 2026/01/11 04:56:15.802812 [upgrade-all] FINISHED PROCESSING Status: FAILED; Summary: Number of successful operations: 10; Number of skipped operations: 0; Number of service instance orphans detected: 0; Number of deleted instances before operation could happen: 0; Number of busy instances which could not be processed: 0; Number of service instances that failed to process: 1 [########-####-####-####-624vl4p1d9v0]\n[upgrade-all-service-instances] 2026/01/11 04:56:15.802828 [########-####-####-####-624vl4p1d9v0] Operation failed: bosh task id 0: \n","stderr":"Error: failed to run job-process: exit status 1 (exit status 1)
########-####-####-####-624vl4p1d9v0. Running 'bosh -d ########-####-####-####-624vl4p1d9v0 vms' shows 1 or more instances in Unresponsive Agent or Failed status.This was observed on Tanzu RabbitMQ for Tanzu Application Service when upgrading from version 6.0.12 to 10.0.3 (RMQ versions 3.3.16 to 4.0.12)
The cause of the upgrade-all-instances job during RabbitMQ tile upgrade is not conclusive based on the Opsman logging presented. Deeper investigation into the Bosh task logging is required in order to isolate the exact cause of failure. The below resolution steps will help identify exactly which job is failing on the Deployment in question:
bosh tasks -r=10service-instance_########-####-####-####-624vl4p1d9v0"create deployment"bosh task <TASK_ID>
{"time":1768346254,"stage":"Updating instance","tags":["rabbitmq-server"],"total":2,"task":"rabbitmq-server/########-####-####-####-a2e198c173d5 (1) (canary)","index":1,"state":"in_progress","progress":100,"data":{"status":"executing post-start"}}
{"time":1768346264,"stage":"Updating instance","tags":["rabbitmq-server"],"total":2,"task":"rabbitmq-server/########-####-####-####-a2e198c173d5 (1) (canary)","index":1,"state":"failed","progress":100,"data":{"error":"Action Failed get_task: Task ########-####-####-####-c5175a153de2 result: 1 of 3 post-start scripts failed. Failed Jobs: rabbitmq-server. Successful Jobs: ipsec, bosh-dns."}}
{"time":1768346264,"error":{"code":450001,"message":"Action Failed get_task: Task ########-####-####-####-c5175a153de2 result: 1 of 3 post-start scripts failed. Failed Jobs: rabbitmq-server. Successful Jobs: ipsec, bosh-dns."}}
bosh ssh -d service-instance_########-####-####-####-624vl4p1d9v0 rabbitmq-server/########-####-####-####-a2e198c173d5
cat /var/vcap/sys/log/rabbitmq-server/post-start.stdout.log
2026-01-12T23:42:14Z: Wait for RabbitMQ node startup...
Waiting for pid file '/var/vcap/sys/run/rabbitmq-server/pid' to appear
pid is 17086
Waiting for erlang distribution on node 'rabbit@############################ab7a' while OS process '17086' is running
Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@f7ffda6cc33ca0a4749d6f50dd4aab7a'
Applications 'rabbit_and_plugins' are running on node 'rabbit@############################ab7a'
2026-01-12T23:42:39Z: Running node checks at Mon Jan 12 11:42:39 PM UTC 2026 from post-start...
2026-01-12T23:42:41Z: Node checks running from post-start passed
2026-01-12T23:42:41Z: Running cluster checks from post-start...
Testing TCP connections to all active listeners on node rabbit@############################ab7a using hostname resolution ...
Will connect to ############################ab7a:15672
Will connect to ############################ab7a:15692
Will connect to ############################ab7a:25672
Will connect to ############################ab7a:5672
Successfully connected to ports 5672, 15672, 15692, 25672 on node rabbit@############################ab7a (using node hostname resolution)
Checking if all vhosts are running on node rabbit@############################ab7a ...
Node rabbit@############################ab7a reported all vhosts as running
User 'guest' exists
2026-01-12T23:42:45Z: RabbitMQ cluster is not healthy
In the above instance failure, RabbitMQ instances had been recreated using 'bosh cck' commands. Because of this, a new RabbitMQ cluster was created when the instance upgrade was attempted. Corrective action required manually updating users, as well as manually joining nodes to the cluster, messaging data that hasn't already been processed will be lost as a result.
Please open a support ticket with the Broadcom support team to help correct the cluster replication, or, recreate the On-Demand cluster.