In Aria Automation 8.18.x, pipelines may fail to execute with the error message:
Error in enqueue exception: Retries exhausted: 15/15.
ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] c.v.c.s.impl.ExecutionQueueServiceImpl.lambda$queueExecution$4:97 - Error in enqueue exception Retries exhausted: 15/15DEBUG codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] o.s.w.r.f.client.ExchangeFunctions.traceDebug:120 - [49f87b3f] HTTP POST http://<Code_Stream_Internal_IP>:8000/codestream/api/callback/resource/#######-####-####-####-########ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] c.v.c.service.impl.ExecutionServiceImpl.lambda$queueExecution$44:776 - Unable to queue execution due to Retries exhausted: 15/15ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] reactor.core.publisher.Operators.error:324 - Operator called default onErrorDroppedreactor.core.Exceptions$ErrorCallbackNotImplemented: reactor.core.Exceptions$RetryExhaustedException: Retries exhausted: 15/15Caused by: reactor.core.Exceptions$RetryExhaustedException: Retries exhausted: 15/15
Aria Automation 8.18.x
The failure is likely due to code-stream threads in an anomalous or stuck state, preventing further task execution.
One of the nodes could've get stuck/ went unhealthy. During, allocation of new executions, all the stuck executions were allocated to the pod(s) which were not in good condition. This could've resulted in them being stuck/ NOT_STARTED state.
Restart the Code Stream services by deleting and allowing the codestream pods to restart.
Note: Take snapshots of all Aria Automation nodes (without memory) from vCenter before performing the below steps.
kubectl get pods -n prelude | grep code
kubectl delete pod <codestream_pod_name> -n prelude
# Example:
kubectl delete pod <Codestream_POD> -n prelude
Note: Repeat the above step for all the remaining codestream pods from other nodes
watch 'kubectl get pods -n prelude | grep code'
Wait for the pod STATUS to be Running and READY to show 3/3 (in clustered Environment).
Default execution concurrency for each pipeline comes at 10