Provisioning or Day-2 Operations slow to execute

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Provisioning is taking a long time or may even fail
Day-2 actions are taking a longer time to complete (4-8+ minutes).

RabbbitMQ logs /services-logs/prelude/rabbitmq-ha-#/file-logs/rabbitmq-ha.log show the following:

2025-07-23 13:58:58.903554+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.
2025-07-23 13:58:58.903829+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.
2025-07-23 13:58:58.906569+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.

Get quorum status command:

kubectl -n prelude exec -it rabbitmq-ha-0 -- rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########

Output will show all rows as follower like this:

┌──────────────────────────────────────────────────────────────────────┬────────────┬───────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name                                                            │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
└──────────────────────────────────────────────────────────────────────┴────────────┴───────────┴──────────────┴────────────────┴──────┴─────────────────┘

Aria Automation portal shows provisioning failures like this:

Extensibility triggered task failed. Event ID: ########-####-####-####-###########1. Failure: Extensibility error received for topic provisioning.request.pre, eventId = '########-####-####-####-###########1': [666] Error publishing event Event(super=BaseEvent(id=########-####-####-####-###########1, sourceType=provisioning, sourceIdentity=########-####-####-####-###########2, timeStamp=Mon Nov 03 12:00:06 GMT 2025, headers={provisioning-callback-uri=/provisioning/config/extensibility-callbacks/########-####-####-####-###########1, tokenId=############################################, encryption-context=########-####-####-####-###########3}, data={componentId=Cloud_NSX_Network_1, externalIds=[null], blueprintId=########-####-####-####-###########4, tags={}, customProperties={isSimulate=false}, componentTypeId=Cloud.NSX.Network, requestId=########-####-####-####-###########5, resourceCount=1, deploymentId=########-####-####-####-###########3, operation=ALLOCATE_RESOURCE, projectId=########-####-####-####-###########6, resourceType=COMPUTE_NETWORK, resourceIds=[########-####-####-####-###########7]}), eventType=EVENT, eventTopicId=provisioning.request.pre, correlationType=contextId, correlationId=########-####-####-####-###########3--########-####-####-####-###########5, description=null, targetType=RequestBrokerState, targetId=########-####-####-####-###########8, userName=kocpadm, orgId=########-####-####-####-###########9, projectId=########-####-####-####-###########6). Data = {componentId=Cloud_NSX_Network_1, externalIds=[null], blueprintId=########-####-####-####-###########4, tags={}, customProperties={isSimulate=false}, componentTypeId=Cloud.NSX.Network, requestId=########-####-####-####-###########5, resourceCount=1, deploymentId=########-####-####-####-###########3, operation=ALLOCATE_RESOURCE, projectId=########-####-####-####-###########6, resourceType=COMPUTE_NETWORK, resourceIds=[########-####-####-####-###########7]} For serviceCallback=http://10.###.###.###:8282/provisioning...

Environment

Aria Automation 8.18.x

Resolution

Take snapshot of the cluster VMs.
Find the leaderless queues from the rabbitmq logs:
1. grep 'leader not known' /services-logs/prelude/rabbitmq-ha-0/file-logs/rabbitmq-ha.log | cut -d\' -f2 | sort | uniq
2. If the returned queue names begin with "%2F_" remove it. This turns "%2F_com.vmware.automation..." into "com.vmware.automation...".
Get an interactive shell within the rabbitmq pod:
1. kubectl -n prelude exec -it rabbitmq-ha-0 -- /bin/bash
Get the quorum status of the queues returned by the grep command above:
1. rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########
2. You should see that each node is marked as a "follower" in the Raft State column
Once confirmed that the queue shows all nodes as followers use the command here to trigger a leader election amongst the nodes:
1. rabbitmqctl eval ' V = <<"/">>, QName = <<"com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########">>, {ok, QRec} = rabbit_amqqueue:lookup(rabbit_misc:r(V, queue, QName)), RaID = amqqueue:get_pid(QRec), ra:trigger_election(RaID).'
Check the quorum status again, we should now see one leader node and two follower nodes:
1. rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########

Note: If the above steps do not resolve the issue, or if you run into any difficulties, please raise a case with Broadcom for further investigation. Alternatively the issue can also be resolved with resetting RabbitMQ once all provisioning is stopped: RabbitMQ cluster issue causes deployment failure in Aria Automation

Additional Information

When running step 4 in the resolution, if one of the queues is showing "noproc" for a node, then you will need to reset rabbitMQ to resolve, following the instructions in Resolve RabbitMQ cluster issues in vRA 8.x deployment