In a multi-node, highly available VMware Aria Automation environment fronted by a load balancer, you may experience intermittent failures when making calls to the vRA API. Specifically, vRealize Orchestrator (vRO) workflows running Python scripts that call the IaaS API may fail and report the following error within the logs or client:
Unable to pull project information...
('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
ebs.error.code=10040Additionally, you may observe the load balancer health monitor intermittently flapping or marking a specific node as DOWN for brief periods (e.g., 30 seconds to 8 minutes).
This issue occurs when the provisioning-service-app pod on a specific node begins consuming excessive CPU cycles. This localized CPU spike starves the health-reporting-app pod of resources, causing latency on the port 8008 /health endpoint to drift past the load balancer's configured response timeout (typically 6 seconds).
While the load balancer accumulates failed retries (the detection gap), it continues to route API traffic to the unhealthy node. The node accepts the TCP connection at the SSL bridge layer but closes it without responding, which triggers the RemoteDisconnected error in the calling workflow or client.
To resolve the CPU starvation and stabilize the affected node, you must force Kubernetes to recreate the provisioning pod. You should also adjust your load balancer and workflow configurations to prevent future connection drops.
provisioning-service-app pod running on it.kubectl delete pod -n prelude <provisioning-service-pod-name>resptimeout) on your VMware Aria Automation health monitor from 6 seconds to 15 seconds to avoid false negatives during heavy service load.8008 health endpoint for cluster monitoring.