The appliance disks fill up frequently due to vmo_tokenreplay growing rapidly when using VMware Aria Automation or Automation Orchestrator
search cancel

The appliance disks fill up frequently due to vmo_tokenreplay growing rapidly when using VMware Aria Automation or Automation Orchestrator

book

Article ID: 326109

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:
  • The a VMware Aria Automation or Automation Orchestrator cluster is not healthy and some pods fail.
  • You see a status similar to:
    NAME          READY   STATUS                      RESTARTS    AGE
    Pod-name      0/1     Init:ErrImageNeverPull      12          14h
  • The VMware Aria Automation or Orchestrator appliance data partition (/data) disk usage is above 80% and fills up frequently, even if more disk space is added.
    For example:
    root@vra-appliance [ / ]# df -h /data
    Filesystem                    Size Used Avail Use% Mounted on
    /dev/mapper/data_vg-data      196G 157G  30G  85%  /data
  • The postgreSQL database "vco-db" is more than few gigabytes and is growing fast.
    For example:
    template1=# SELECT pg_database.datname as "database_name", pg_database_size(pg_database.datname)/1024/1024 AS size_in_mb FROM pg_database ORDER by size_in_mb DESC;
    
       database_name    | size_in_mb 
    --------------------+------------
     vco-db             |         77000
     provisioning-db    |         66
     catalog-db         |         12
  • The vmo_tokenreplay table in the vco-db postgreSQL database is more than few gigabytes and is growing fast.
    For example:
    template1=# \c vco-db
    You are now connected to database "vco-db" as user "postgres".
    vco-db=# SELECT
       relname as "Table",
       pg_size_pretty(pg_total_relation_size(relid)) As "Size" 
       FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 5;
              Table          |  Size       
    -------------------------+-------------
     vmo_tokenreplay         | 53536123 kB
     vmo_vroconfiguration    | 544 kB      
     vmo_scriptmodule        | 520 kB      
     vmo_scriptmodulecontent | 464 kB      
     vmo_contentsignature    | 456 kB    


Environment

VMware vRealize Orchestrator 8.x
VMware vRealize Automation 8.x

Cause

This issue occurs when disk pressure is experienced in Kubernetes and is standard behavior when the data disk usage goes above 80%. Kubernetes attempts to free some space by deleting the available docker images. This causes some of the services to fail.

Resolution

This issue is resolved in VMware vRealize Automation 8.2.0, available at Broadcom Downloads.

Workaround:
To work around this issue if you do not want to upgrade:

Note: Before proceeding with the steps below, VMware recommends to backup the vRA/vRO system using snapshots without stopping the VMs.

  1. SSH login to one of the vRA/vRO nodes.

    Note: In case of of cluster deployments, complete the steps below:

    a. Entify the primary postgres pod, using the "vracli status" command:

    For example:

    root@vra-appliance [ ~ ]# vracli status | grep primary -B 2
            "Total data size": "263 MB",
            "Conninfo": "host=postgres-1.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10",
            "Role": "primary",


    b. Entify the vRA node running the primary db pod with the "kubectl -n prelude get pods -o wide" command:

    For example:

    root@vra-appliance [ ~ ]# kubectl -n prelude get pods -o wide| grep postgres-1
    postgres-1    1/1     Running   0    15h   10.244.1.156   vra-appliance.domain.com   <none>           <none>

     
  2. SSH login to the vRA/vRO primary DB node.
  3. Connect to the postgreSQL database and delete the vRO token replay table content:

    vracli psql dev
    template1=# \c vco-db
    You are now connected to database "vco-db" as user "postgres".
    vco-db=# TRUNCATE table vmo_tokenreplay;
    TRUNCATE TABLE

     
  4. On each vRA/vRO appliance node, execute the command below to restore the deleted docker images:

    /opt/scripts/restore_docker_images.sh
     
  5. Wait until the vRA/vRO cluster is healthy and all pods are in running state.
  6. On each vRA/vRO appliance node, disable the vRO token replay feature with this command:

    rm /data/vco/usr/lib/vco/app-server/extensions/tokenreplay-8.x.0.jar

    Note: based on the vRA/vRO product version, the filename is different:

    8.1: tokenreplay-8.1.0.jar
    8.2: tokenreplay-8.2.0.jar

    The file will be created again on each execution of /opt/scripts/deploy.sh. If you need to run the deploy.sh script, delete the token replay jar file again.