Provisioning or Day-2 Operations slow to execute
search cancel

Provisioning or Day-2 Operations slow to execute

book

Article ID: 407028

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Provisioning is taking a long time or may even fail
  • Day-2 actions  are taking a longer time to complete (4-8+ minutes).
  • RabbbitMQ logs /services-logs/prelude/rabbitmq-ha-#/file-logs/rabbitmq-ha.log  show the following: 
    2025-07-23 13:58:58.903554+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.
    2025-07-23 13:58:58.903829+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.
    2025-07-23 13:58:58.906569+00:00 [info] <0.889.0> {'%2F_com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########','rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local'}: leader call - leader not known. Command will be forwarded once leader is known.
  • Get quorum status command: 
    kubectl -n prelude exec -it rabbitmq-ha-0 -- rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########
    Output will show all rows as follower like this: 
    ┌──────────────────────────────────────────────────────────────────────┬────────────┬───────────┬──────────────┬────────────────┬──────┬─────────────────┐
    │ Node Name                                                            │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
    ├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
    │ rabbit@rabbitmq-ha-2.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
    ├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
    │ rabbit@rabbitmq-ha-1.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
    ├──────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
    │ rabbit@rabbitmq-ha-0.rabbitmq-ha-discovery.prelude.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 3               │
    └──────────────────────────────────────────────────────────────────────┴────────────┴───────────┴──────────────┴────────────────┴──────┴─────────────────┘
  • Aria Automation portal shows provisioning failures like this: 

    Extensibility triggered task failed. Event ID: ########-####-####-####-###########1. Failure: Extensibility error received for topic provisioning.request.pre, eventId = '########-####-####-####-###########1': [666] Error publishing event Event(super=BaseEvent(id=########-####-####-####-###########1, sourceType=provisioning, sourceIdentity=########-####-####-####-###########2, timeStamp=Mon Nov 03 12:00:06 GMT 2025, headers={provisioning-callback-uri=/provisioning/config/extensibility-callbacks/########-####-####-####-###########1, tokenId=############################################, encryption-context=########-####-####-####-###########3}, data={componentId=Cloud_NSX_Network_1, externalIds=[null], blueprintId=########-####-####-####-###########4, tags={}, customProperties={isSimulate=false}, componentTypeId=Cloud.NSX.Network, requestId=########-####-####-####-###########5, resourceCount=1, deploymentId=########-####-####-####-###########3, operation=ALLOCATE_RESOURCE, projectId=########-####-####-####-###########6, resourceType=COMPUTE_NETWORK, resourceIds=[########-####-####-####-###########7]}), eventType=EVENT, eventTopicId=provisioning.request.pre, correlationType=contextId, correlationId=########-####-####-####-###########3--########-####-####-####-###########5, description=null, targetType=RequestBrokerState, targetId=########-####-####-####-###########8, userName=kocpadm, orgId=########-####-####-####-###########9, projectId=########-####-####-####-###########6). Data = {componentId=Cloud_NSX_Network_1, externalIds=[null], blueprintId=########-####-####-####-###########4, tags={}, customProperties={isSimulate=false}, componentTypeId=Cloud.NSX.Network, requestId=########-####-####-####-###########5, resourceCount=1, deploymentId=########-####-####-####-###########3, operation=ALLOCATE_RESOURCE, projectId=########-####-####-####-###########6, resourceType=COMPUTE_NETWORK, resourceIds=[########-####-####-####-###########7]} For serviceCallback=http://10.###.###.###:8282/provisioning...
         

Environment

Aria Automation 8.18.x

 

Resolution

  1. Take snapshot of the cluster VMs.
  2. Find the leaderless queues from the rabbitmq logs:
    1. grep 'leader not known' /services-logs/prelude/rabbitmq-ha-0/file-logs/rabbitmq-ha.log | cut -d\' -f2 | sort | uniq
    2. If the returned queue names begin with "%2F_" remove it. This turns "%2F_com.vmware.automation..." into "com.vmware.automation...".
  3. Get an interactive shell within the rabbitmq pod:
    1. kubectl -n prelude exec -it rabbitmq-ha-0 -- /bin/bash
  4. Get the quorum status of the queues returned by the grep command above:
    1. rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########
    2. You should see that each node is marked as a "follower" in the Raft State column
  5. Once confirmed that the queue shows all nodes as followers use the command here to trigger a leader election amongst the nodes:
    1. rabbitmqctl eval ' V = <<"/">>, QName = <<"com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-###########">>, {ok, QRec} = rabbit_amqqueue:lookup(rabbit_misc:r(V, queue, QName)), RaID = amqqueue:get_pid(QRec), ra:trigger_election(RaID).' 
  6. Check the quorum status again, we should now see one leader node and two follower nodes:
    1. rabbitmq-queues quorum_status com.vmware.automation.relocation-service-deployment-resource-action-post-qq-########-####-####-####-########### 

Note: If the above steps do not resolve the issue, or if you run into any difficulties, please raise a case with Broadcom for further investigation. Alternatively the issue can also be resolved with resetting RabbitMQ once all provisioning is stopped:  RabbitMQ cluster issue causes deployment failure in Aria Automation

Additional Information

When running step 4 in the resolution, if one of the queues is showing "noproc" for a node, then you will need to reset rabbitMQ to resolve, following the instructions in Resolve RabbitMQ cluster issues in vRA 8.x deployment