vIDM 3.3.x vPostgres DB OAuth2RefreshToken table consumes most space on the appliance leading to service outages
search cancel

vIDM 3.3.x vPostgres DB OAuth2RefreshToken table consumes most space on the appliance leading to service outages

book

Article ID: 322682

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • vIDM DB partition is full or nearly out of space.
  • To validate which DB partition is using most of the space run the following commands:
    du -sh /db/elasticsearch
    du -sh /db/rabbitmq
    du -sh /db/data

Note: This will detail and display the /db/data volume taking most of the space. If it is one of the other disk trees, please see other KBs (i.e. audit data).



Environment

VMware Identity Manager 3.3.x

Resolution

Prerequisite:

  • Ensure you have root credentials for the appliances.
  • Take a valid backup or a snapshot before proceeding.

Steps to clean up excessive disk space consumed by OAuth2RefreshToken table.

  1. SSH into the primary appliance where the Master role for vPostgres is located
  2. Stop the horizon-workspace service on all nodes
    # service horizon-workspace stop
  3. Find the largest table sizes
    # export PGPASSWORD=`cat /usr/local/horizon/conf/db.pwd`
    
    # /opt/vmware/vpostgres/current/bin/psql -U postgres -d saas
    
    SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
    FROM pg_class C
    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
    WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
    ORDER BY pg_total_relation_size(C.oid) DESC
    LIMIT 20;
  4. Remove the data from the table by running a truncate
    # truncate table "OAuth2RefreshToken" CASCADE;
  5. Reindex the database and vacuum. Note: if the disk is too full or approaching 100%, extending the disk prior to the following steps would be recommended. Vacuum Full operation will need extra space for the process to complete.
    # select count (*) from "OAuth2RefreshToken";
    # select count (*) from "OAuth2AccessToken";
    # select count (*) from "SuiteTokenCache";
    # vacuum full verbose;
    # reindex table "OAuth2RefreshToken";
    # vacuum full verbose;
    # \q
  6. Restart the vPostgres service
    # service vpostgres restart
  7. Start the horizon-workspace service on all nodes
    # service horizon-workspace start
  8. Wait for 10 minutes after restart to allow services to fully initialize.
  9. Count the number of rows in the OAuth2RefreshToken table
    # /opt/vmware/vpostgres/current/bin/psql -U postgres -d saas
    # select count (*) from "OAuth2RefreshToken";
 
Additional Workaround:
If on step 4 above, the TRUNCATE query does not produce any output at all after several minutes, we may need to find and forcible stop a task:
  1. If TRUNCATE has given no output at all, it is safe to use Ctrl+C to stop the request
  2. Use this SQL query to find psql replication processes, which may be stuck:
    # SELECT pid,application_name FROM pg_stat_replication;
  3. You may see application_name = walreceiver results which remain whenever checked. Use this SQL query to forcibly stop a process:
    # SELECT pg_terminate_backend(<stuck_pid>);
  4. Return to step 2 here to check if any of the same processes are still here.
  5. You can now return to step 4 of the main resolution steps: attempt to truncate the DB once again and continue from there.

Additional Information

You can also expand the /db file system via logical volume management.