Selecting Dashboard > Largest token in UI crashes vCO service with 502, OK message

search cancel

Selecting Dashboard > Largest token in UI crashes vCO service with 502, OK message

book

Article ID: 368965

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

When navigating to Dashboard > Largest token in the UI the vCO Kubernetes pod exceeds its memory limit and is killed. The systemd.journal logs contain an error similar to:

May 13 13:14:46 <Hostname> kernel: GC Thread#16 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=941May 13 13:14:46 <Hostname> kernel: Memory cgroup out of memory: Kill process 2650547 (java) score 1938 or sacrifice child
May 13 13:14:46 <Hostname> kernel: Killed process 2650547 (java) total-vm:19878224kB, anon-rss:8720004kB, file-rss:32384kB, shmem-rss:0kB

Checking the vco database using query select * from vmo_workflowtokenstatistics ORDER BY tokensize DESC LIMIT 20; there are workflow token runs that are multiple MB's in size.
Similarly the following query select * from vmo_workflowtokenstatistics s, vmo_workflowtoken e where s.tokenid=e.id order by s.tokensize desc limit 1; returns a workflow token run with a blank tokensize value indicating an issue during compression/decompression of the content stored:

Environment

Aria Automation Orchestrator 8.x

Cause

The crash can occur when there is large data stored in workflow runs which triggers a defect in the library used to compress and decompress the content resulting in a failure to calculate the statistic data for the workflow run.

In particular when workflows work with files and load strings in memory and keep large strings as variables.

Resolution

The problematic library is intended to be replaced in the Aria Automation 8.18.1 release.

To prevent the issue from occurring ensure you are not storing any unnecessary data in your workflow runs. At the end of using a variable that may contain large string data, set it to empty string so it is not stored in the Orchestrator database. If possible, do not store them as variables at all, unless absolutely needed.

To clean up the largest tokens, Use this query (Only proceed after taking a snapshot):

1. SSH to the appliance and connect to the database instance

vracli dev psql

2. Connect to the vCO db: \c vco-db3. Remove the largest workflow token run (You can modify the size of the query if needed)

delete from vmo_workflowtoken where id in (select tokenid from vmo_workflowtokenstatistics s where (s.tokensize is null or s.tokensize >= 10000000));
delete from vmo_workflowtokencontent where workflowtokenid in (select tokenid from vmo_workflowtokenstatistics s where (s.tokensize is null or s.tokensize >= 10000000));
delete from vmo_workflowtokenstatistics where tokenid in (select tokenid from vmo_workflowtokenstatistics s where (s.tokensize is null or s.tokensize >= 10000000));Another option to mitigate the issue is to increase the memory assigned to the affected Orchestrator instance

Feedback

thumb_up Yes

thumb_down No