Symptoms:
TileExecution stats
that running
tiles equals maxConcurrent
:
INFO tango-blueprint [host='tango-blueprint-service-app-xxx' thread='generalScheduler-4' user='' org='' blueprint='' project='' deployment='' request='' flow='' task='' tile='' resourceName='' operation='' trace=''] com.vmware.tango.blueprint.telemetry.LogMetricsConfig - TileExecution stats: running=150 created=3 maxConcurrent=150 batchSize=5 rescheduled=2 currentPendingBlockingRequests=0 totalBlockingRequests=14 totalPendingBlockingRequests=0 currentNextExecutions=0
java.lang.IllegalArgumentException: Missing: traceId spanId
^ This log is: /services-logs/prelude/tango-blueprint-service-app/file-logs/tango-blueprint-service-app.log
VMware Aria Automation systems which have been upgraded to 8.18.0
This can be caused by in-progress deployments at the time of upgrade. These may be waiting for approval.
As such, if the upgrade can be rolled back to pre-upgrade snapshot, the issue may be avoided by ensuring there are no running actions or deployments while upgrading to 8.18.0
These incomplete pre-upgrade jobs are repeatedly added to the execution queue until the concurrent maximum is reached and nothing can progress.
It is possible to get some relief from the issue by restarting all tango-blueprint pods, using the following commands:
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=
0
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=
3
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=
1
When the pods have terminated and been recreated, all hung deployments & day2 actions should progress.
The following procedure makes direct edits in the Aria Automation database. Please take careful note of step 1 and contact VMware Support if assistance is needed
vracli db dump tango-blueprint-db > tango-blueprint-db.sql
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=0
watch 'kubectl -n prelude get pods | grep tango-blueprint'
vracli dev psql
\c tango-blueprint-db
(7 rows)
in SELECT results, and UPDATE 7
returned from the update transaction.
7a. -- Tiles to be updated:
SELECT ID AS TILE_ID, ENV['DEPLOYMENT_ID'] AS DEPLOYMENT_ID, ENV['DEPLOYMENT_NAME'] AS NAME
FROM BP_TILE_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%' AND STATUS IN ('SCHEDULED', 'IN_PROGRESS', 'WAITING', 'NOT_STARTED');
--> Update the tiles:
BEGIN;
UPDATE BP_TILE_EXECUTION
SET ENV['TRACE_CONTEXT'] = '"{\"traceId\":\"66ed579ea44c9710d4983c630c023780\",\"spanId\":\"c789ad6d4d5113b1\",\"trace\":\"66ed579ea44c9710d4983c630c023780\",\"traceparent\":\"00-66ed579ea44c9710d4983c630c023780-c789ad6d4d5113b1-00\"}"'
WHERE ID IN
(SELECT ID
FROM BP_TILE_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%' AND STATUS IN ('SCHEDULED', 'IN_PROGRESS', 'WAITING', 'NOT_STARTED'));
If the figures for SELECT results and "UPDATE _" results do not agree, then we will rollback the transaction. This will undo the UPDATE query, back to the keyword BEGIN. This is achieved by running: ROLLBACK;
Otherwise, if the figures do agree, then run the following command to commit the update transaction:
COMMIT;
7b. -- Tasks to be updated
SELECT ID AS TASK_ID , ENV['DEPLOYMENT_ID'] AS DEPLOYMENT_ID, ENV['DEPLOYMENT_NAME'] AS NAME
FROM BP_TASK_EXECUTION
WHERE FLOW_EXECUTION_ID IN
(SELECT ID FROM BP_FLOW_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%'
AND STATUS IN ('SCHEDULED','WAITING','IN_PROGRESS'))
AND STATUS IN ('NOT_STARTED','SCHEDULED','IN_PROGRESS','WAITING');
--> Update the tasks
BEGIN;
UPDATE BP_TASK_EXECUTION
SET ENV['TRACE_CONTEXT'] = '"{\"traceId\":\"66ed579ea44c9710d4983c630c023780\",\"spanId\":\"c789ad6d4d5113b1\",\"trace\":\"66ed579ea44c9710d4983c630c023780\",\"traceparent\":\"00-66ed579ea44c9710d4983c630c023780-c789ad6d4d5113b1-00\"}"'
WHERE FLOW_EXECUTION_ID IN
(SELECT ID
FROM BP_FLOW_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%'
AND STATUS IN ('SCHEDULED','WAITING','IN_PROGRESS'))
AND STATUS IN ('NOT_STARTED','SCHEDULED','IN_PROGRESS','WAITING');
If the figures for SELECT results and "UPDATE _" results do not agree, then we will rollback the transaction. This will undo the UPDATE query, back to the keyword BEGIN. This is achieved by running: ROLLBACK;
Otherwise, if the figures do agree, run the following command to commit the update transaction:
COMMIT;
7c. -- Flows to be updated
SELECT ID AS FLOW_ID, ENV['DEPLOYMENT_ID'] AS DEPLOYMENT_ID, ENV['DEPLOYMENT_NAME'] AS NAME
FROM BP_FLOW_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%' AND STATUS IN ('SCHEDULED', 'WAITING', 'IN_PROGRESS');
--> Update the flows
BEGIN;
UPDATE BP_FLOW_EXECUTION
SET ENV['TRACE_CONTEXT'] = '"{\"traceId\":\"66ed579ea44c9710d4983c630c023780\",\"spanId\":\"c789ad6d4d5113b1\",\"trace\":\"66ed579ea44c9710d4983c630c023780\",\"traceparent\":\"00-66ed579ea44c9710d4983c630c023780-c789ad6d4d5113b1-00\"}"'
WHERE ID IN
(SELECT ID
FROM BP_FLOW_EXECUTION
WHERE ENV['TRACE_CONTEXT']::text LIKE '%uber-trace-id%' AND STATUS IN ('SCHEDULED', 'WAITING', 'IN_PROGRESS'));
If the figures for SELECT results and "UPDATE _" results do not agree, then we will rollback the transaction. This will undo the UPDATE query, back to the keyword BEGIN. This is achieved by running: ROLLBACK;
Otherwise, if the figures do agree, run the following command to commit the update transaction:
COMMIT;
8. Exit the database, with \q
or Ctrl+D
9. Restart the tango-blueprint-service-app pod.
For a 3-node cluster:
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=3
For a single-node system:
kubectl scale deployment -n prelude tango-blueprint-service-app --replicas=1
VMware Engineering are working on a permanent fix in code, so that upgraded systems will not be affected by in-progress or waiting deployments.