Pipeline Fails with "Error in enqueue exception: Retries exhausted: 15/15"

search cancel

Pipeline Fails with "Error in enqueue exception: Retries exhausted: 15/15"

book

Article ID: 405130

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

In Aria Automation 8.18.x, pipelines may fail to execute with the error message:
Error in enqueue exception: Retries exhausted: 15/15.

ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] c.v.c.s.impl.ExecutionQueueServiceImpl.lambda$queueExecution$4:97 - Error in enqueue exception Retries exhausted: 15/15
DEBUG codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] o.s.w.r.f.client.ExchangeFunctions.traceDebug:120 - [49f87b3f] HTTP POST http://<Code_Stream_Internal_IP>:8000/codestream/api/callback/resource/#######-####-####-####-########
ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] c.v.c.service.impl.ExecutionServiceImpl.lambda$queueExecution$44:776 - Unable to queue execution due to Retries exhausted: 15/15
ERROR codestream [host='<Codestream-POD>' thread='parallel-6' user='' org='' trace='' parent='' span=''] reactor.core.publisher.Operators.error:324 - Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: reactor.core.Exceptions$RetryExhaustedException: Retries exhausted: 15/15
Caused by: reactor.core.Exceptions$RetryExhaustedException: Retries exhausted: 15/15

Environment

Aria Automation 8.18.x

Cause

The failure is likely due to code-stream threads in an anomalous or stuck state, preventing further task execution.

One of the nodes could've get stuck/ went unhealthy. During, allocation of new executions, all the stuck executions were allocated to the pod(s) which were not in good condition. This could've resulted in them being stuck/ NOT_STARTED state.

Resolution

Restart the Code Stream services by deleting and allowing the codestream pods to restart.

Note: Take snapshots of all Aria Automation nodes (without memory) from vCenter before performing the below steps.

Get the codestream pod names:
kubectl get pods -n prelude | grep code
Delete the codestream pods:

kubectl delete pod <codestream_pod_name> -n prelude
# Example:
kubectl delete pod <Codestream_POD> -n prelude

Note: Repeat the above step for all the remaining codestream pods from other nodes

Monitor pod recreation:

watch 'kubectl get pods -n prelude | grep code'

Wait for the pod STATUS to be Running and READY to show 3/3 (in clustered Environment).

Additional Information

Default execution concurrency for each pipeline comes at 10

Feedback

thumb_up Yes

thumb_down No