vIDM 3.3.x vPostgres DB OAuth2RefreshToken table consumes most space on the appliance leading to service outages
book
Article ID: 322682
calendar_today
Updated On:
Products
VMware Aria Suite
Issue/Introduction
Symptoms:
vIDM DB partition is full or nearly out of space.
To validate which DB partition is using most of the space run the following commands:
du -sh /db/elasticsearch
du -sh /db/rabbitmq
du -sh /db/data
Note: This will detail and display the /db/data volume taking most of the space. If it is one of the other disk trees, please see other KBs (i.e. audit data).
Environment
VMware Identity Manager 3.3.x
Resolution
Prerequisites
You have root access to the appliance.
You have valid backups or snapshots.
Steps to clean up excessive disk space consumed by OAuth2RefreshToken table.
SSH into the primary appliance where the Master role for vPostgres is located
Stop the horizon-workspace service on all nodes
service horizon-workspace stop
Find the largest table sizes
export PGPASSWORD=`cat /usr/local/horizon/conf/db.pwd`
/opt/vmware/vpostgres/current/bin/psql -U postgres -d saas
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
LIMIT 20;
Remove the data from the table by running a truncate
truncate table "OAuth2RefreshToken" CASCADE;
Reindex the database and vacuum. Note: if the disk is too full or approaching 100%, extending the disk prior to the following steps would be recommended. Vacuum Full operation will need extra space for the process to complete.
select count (*) from "OAuth2RefreshToken";
select count (*) from "OAuth2AccessToken";
select count (*) from "SuiteTokenCache";
vacuum full verbose;
reindex table "OAuth2RefreshToken";
vacuum full verbose;
\q
Restart the vPostgres service
service vpostgres restart
Start the horizon-workspace service on all nodes
service horizon-workspace start
Wait for 10 minutes after restart to allow services to fully initialize.
Count the number of rows in the OAuth2RefreshToken table
/opt/vmware/vpostgres/current/bin/psql -U postgres -d saas
select count (*) from "OAuth2RefreshToken";
Additional Workaround
If on step 4 above, the TRUNCATE query does not produce any output at all after several minutes, we may need to find and forcible stop a task:
If TRUNCATE has given no output at all, it is safe to use Ctrl+C to stop the request
Use this SQL query to find psql replication processes, which may be stuck:
SELECT pid,application_name FROM pg_stat_replication;
You may see application_name = walreceiver results which remain whenever checked. Use this SQL query to forcibly stop a process:
SELECT pg_terminate_backend(<stuck_pid>);
Return to step 2 here to check if any of the same processes are still here.
You can now return to step 4 of the main resolution steps: attempt to truncate the DB once again and continue from there.