The appliance disks fill up frequently due to vmo_tokenreplay growing rapidly when using VMware Aria Automation or Automation Orchestrator

search cancel

The appliance disks fill up frequently due to vmo_tokenreplay growing rapidly when using VMware Aria Automation or Automation Orchestrator

book

Article ID: 326109

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

The a VMware Aria Automation or Automation Orchestrator cluster is not healthy and some pods fail.

You see a status similar to:

NAME          READY   STATUS                      RESTARTS    AGE
Pod-name      0/1     Init:ErrImageNeverPull      12          14h

The VMware Aria Automation or Orchestrator appliance data partition (/data) disk usage is above 80% and fills up frequently, even if more disk space is added.
For example:
```
root@vra-appliance [ / ]# df -h /data
Filesystem                    Size Used Avail Use% Mounted on
/dev/mapper/data_vg-data      196G 157G  30G  85%  /data
```

The postgreSQL database "vco-db" is more than few gigabytes and is growing fast.
For example:

template1=# SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;

   database_name    | size_in_mb 
--------------------+------------
 vco-db             |         77000
 provisioning-db    |         66
 catalog-db         |         12

The vmo_tokenreplay table in the vco-db postgreSQL database is more than few gigabytes and is growing fast.
For example:

template1=# \c vco-db
You are now connected to database "vco-db" as user "postgres".
vco-db=# SELECT
   relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) As "Size" 
   FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5;
          Table          |  Size       
-------------------------+-------------
 vmo_tokenreplay         | 53536123 kB
 vmo_vroconfiguration    | 544 kB      
 vmo_scriptmodule        | 520 kB      
 vmo_scriptmodulecontent | 464 kB      
 vmo_contentsignature    | 456 kB

Environment

VMware vRealize Orchestrator 8.x
VMware vRealize Automation 8.x

Cause

This issue occurs when disk pressure is experienced in Kubernetes and is standard behavior when the data disk usage goes above 80%. Kubernetes attempts to free some space by deleting the available docker images. This causes some of the services to fail.

Resolution

This issue is resolved in VMware vRealize Automation 8.2.0, available at Broadcom Downloads.

Workaround:
To work around this issue if you do not want to upgrade:

Note: Before proceeding with the steps below, VMware recommends to backup the vRA/vRO system using snapshots without stopping the VMs.

SSH login to one of the vRA/vRO nodes.

Note: In case of of cluster deployments, complete the steps below:

a. Entify the primary postgres pod, using the "vracli status" command:

For example:

root@vra-appliance [ ~ ]# vracli status | grep primary -B 2
"Total data size": "263 MB",
"Conninfo": "host=postgres-1.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10",
"Role": "primary",

b. Entify the vRA node running the primary db pod with the "kubectl -n prelude get pods -o wide" command:

For example:

root@vra-appliance [ ~ ]# kubectl -n prelude get pods -o wide| grep postgres-1
postgres-1 1/1 Running 0 15h 10.244.1.156 vra-appliance.domain.com <none> <none>
SSH login to the vRA/vRO primary DB node.
Connect to the postgreSQL database and delete the vRO token replay table content:

vracli psql dev
template1=# \c vco-db
You are now connected to database "vco-db" as user "postgres".
vco-db=# TRUNCATE table vmo_tokenreplay;
TRUNCATE TABLE
On each vRA/vRO appliance node, execute the command below to restore the deleted docker images:

/opt/scripts/restore_docker_images.sh
Wait until the vRA/vRO cluster is healthy and all pods are in running state.
On each vRA/vRO appliance node, disable the vRO token replay feature with this command:

rm /data/vco/usr/lib/vco/app-server/extensions/tokenreplay-8.x.0.jar

Note: based on the vRA/vRO product version, the filename is different:

8.1: tokenreplay-8.1.0.jar
8.2: tokenreplay-8.2.0.jar

The file will be created again on each execution of /opt/scripts/deploy.sh. If you need to run the deploy.sh script, delete the token replay jar file again.

Feedback

thumb_up Yes

thumb_down No