vRealize Orchestrator PostgreSQL database size is abnormally inflated

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

The vRealize Orchestrator database is very large and continues to grow unexpectedly.
vmo_workflowtokencontent is inflated to GigaBytes in size.

For example:

Run this query to show the total size of all tables in the embedded postgres database:

SELECT nspname || '.' || relname AS "relationName",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
and relname like 'vmo%'
ORDER BY pg_total_relation_size(C.oid) DESC;

The following entries are visible within the postgres log:
- canceling autovacuum task error - This indicates that the procedure is reducing the dead rows and reducing the TOASTED data.
- checkpoints are occurring too frequently error - Indicates that write-ahead logs (WAL) are too small to accommodate the rate of change of data in the database in 5 minute checkpoint window. The WAL stores a log of committed transactions that update data pages in shared_buffers which are ultimately flushed to disk during checkpoints. When the volume of transactions exceeds max_wal_size before a timed checkpoint runs, back end server processes request a checkpoint to flush changes to disk to free up WAL segments for new transactions.
- Inspect the Postgres configuration shared_buffer - must inspect the value of that configuration and increase eventually. The data read and written in the database must first pass through this buffer and in some cases it is 128 MB. This value might be increased in order to handle large I/O volume in the system.

Environment

VMware vRealize Orchestrator 7.x

Cause

This issue occurs when the autovacuum process is unable to run, and the table size continues to grow abnormally.

Resolution

To resolve this issue, assign Postgres service more resources:

Secure Shell into the virtual appliance
Edit /storage/db/pgdata/postgresql.conf file:

Find the following lines, uncomment them if necessary, and edit the values to match the below

shared_buffers = 2GB
maintenance_work_mem = 512MB
min_wal_size = 512MB
max_wal_size = 2GB
checkpoint_completion_target = .8

Note: An empiric test must be performed to find the suitable values. These are good starting values, but may need to be adjusted if issues are still seen with autovacuum processes.

Workaround:
If providing additional resources does not fully resolve the issue, continue with the workaround below.

Manually VACUUM FULL the table

VACUUM (FULL, VERBOSE, ANALYZE) vmo_workflowtokencontent;
INFO: vacuuming "public.vmo_workflowtokencontent"
INFO: "vmo_workflowtokencontent": found 0 removable, 17 nonremovable row versions in 2 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: analyzing "public.vmo_workflowtokencontent"
INFO: "vmo_workflowtokencontent": scanned 2 of 2 pages, containing 17 live rows and 0 dead rows; 17 rows in sample, 17 estimated total rows
VACUUM

Update the auto-vacuum configuration to allow more frequent vacuum attempts:

alter table vmo_workflowtokencontent set (autovacuum_vacuum_scale_factor = 0.05);
alter table vmo_workflowtokencontent set (autovacuum_vacuum_threshold = 25);
alter table vmo_workflowtokencontent set (autovacuum_vacuum_cost_delay = 10);
alter table vmo_workflowtokencontent set (autovacuum_analyze_threshold = 25);
alter table vmo_workflowtokencontent set (toast.autovacuum_vacuum_scale_factor = 0.05);
alter table vmo_workflowtokencontent set (toast.autovacuum_vacuum_threshold = 25);
alter table vmo_workflowtokencontent set (toast.autovacuum_vacuum_cost_delay = 10);

Validate the new auto-vacuum scale factor settings

SELECT relname, pg_options_to_table(reloptions) AS reloption
    FROM pg_class
    WHERE reloptions IS NOT NULL
        AND relnamespace = 'public'::regnamespace
    ORDER BY relname, reloption;

Verify that manual vacuum full command reduced the vmo_workflowtokencontent table size.