You may observe that machine builds and deployments are failing in VMware Aria Automation. When attempting a deployment, it fails to start and returns an event broker service (EBS) event error. You will see an error similar to the following in the UI or deployment execution logs:
Failed to process ebs event: SubscriberID: vro-gateway-<UUID>, RunnableID: <RunnableID UUID> and SubscriptionID: <UUID>-sub_<UUID> failed with the following error: Workflow run [<workflow run UUID>] completed with error [Start workflow failed with client error! Possible reason is missing or invalid required parameter!]
Additionally, when reviewing the internal pod logs, you will see the following exception in the ebs-app.log:
Cause: : : com.vmware.automation.spring.webflux.platform.client.service.exception.WebClientServiceResponseException: ClientResponse has erroneous status code: 500 Internal Server Error.
VMware Aria Automation 8.18.1
This issue occurs when a network outage interrupts communication between internal Kubernetes services. The network disruption causes the event broker service (ebs-app) and related messaging queues (such as rabbitmq) to enter a stale state where they can no longer process events or trigger workflow runs correctly, resulting in HTTP 500 Internal Server Errors.
To resolve this issue, you must restart the underlying Kubernetes and application pods to force them to re-establish healthy network connections:
Log in to the VMware Aria Automation appliance via SSH.
Restart the kube-system services by running the following command:
kubectl -n kube-system delete pods --all
Restart the prelude services by executing the deployment script:
/opt/scripts/deploy.sh
Monitor the pod startup process. Once all pods return to a Running state, attempt to deploy a new machine to confirm the issue is resolved.
The ebs-app and rabbitmq service communication errors observed during this outage are consistent with the behaviors documented in KB 319575.