This article describes how to know the actual size of data of a given table without calculating the bloat.
In the below query "Expected" refers to the size of the table (i.e if you don't consider bloat) and "Reclaimable or claimable" refers to the difference between current size minus expected size.
Query
-- To obtain the expected and reclaimable size of only those tables that has high bloat ( i.e it list only those tables that have the highest bloat on the database).
SELECT bdinspname||'.'||bdirelname "Relation name", bdirelpages::bigint "Curnt Pages", round((bdirelpages*33)/1024)::bigint "Curnt Size(MB)", bdiexppages::bigint "Exped Pages", round((bdiexppages*33)/1024)::bigint "Exped Size(MB)", (bdirelpages - bdiexppages)::bigint "Claimble Pages", round(((bdirelpages - bdiexppages)*33)/1024)::bigint "Claimble Size(MB)", bdidiag "Comments" FROM gp_toolkit.gp_bloat_diag ORDER BY 6;
-- To obtain the expected and reclaimable size of all the table in the database:
SELECT n.nspname||'.'||c.relname "Relation Name", btdrelpages::bigint "Curnt Pages", round((btdrelpages*33)/1024)::bigint "Curnt Size(MB)", btdexppages::bigint "Exped Pages", round((btdexppages*33)/1024)::bigint "Exped Size(MB)", (btdrelpages - btdexppages)::bigint "Claimble Pages", round(((btdrelpages - btdexppages)*33)/1024)::bigint "Claimble Size(MB)" FROM gp_toolkit.gp_bloat_expected_pages e, pg_class c, pg_namespace n WHERE e.btdrelid=c.oid AND n.oid=c.relnamespace ORDER BY 6;
Note:
How it works and calculated
Run the gp_toolkit.gp_bloat_diag to know the tables having bloat:
flightdata=# select * from gp_toolkit.gp_bloat_diag; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+------------------------------------ 92783 | public | test2 | 618870 | 71503 | moderate amount of bloat suspected (1 row)
Checking the values in the pg_class for confirmation:
flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname='test2'; relname | relpages | reltuples ---------+----------+----------- test2 | 618870 | 71000000 (1 row)
So, based on the output from gp_bloat_diag the currently allocated OS pages is 618870 and the expected pages based on the data in the table should be 71503.
Since one OS page is equal to 32k ( i.e 32 * 1024 bytes ) in GPDB
The currently allocated size for the table is 618870 * 32k = 19 GB Approx
And the expected pages would be 71503 * 32k = 2G Approx
Taking it practically, the current size of the table is:
flightdata=# SELECT pg_size_pretty(pg_relation_size('test2')); pg_size_pretty ---------------- 19 GB (1 row)
Remove bloat completely ( either by vacuum full or other options indicated in the article ), vacuum full is chosen here:
flightdata=# vacuum full test2; VACUUM
And, then analyze:flightdata=# analyze test2; ANALYZE
flightdata=# analyze test2; ANALYZE
The current size of the table is 2G which is approx to the calculation we did above:
flightdata=# SELECT pg_size_pretty(pg_relation_size('test2')); pg_size_pretty ---------------- 2441 MB (1 row) flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname in ('test2'); relname | relpages | reltuples ---------+----------+----------- test2 | 78108 | 71000168 (1 row)
Note- It is an approx value, but you will get an idea of how much you can reclaim after you redistribute / vacuum the bloated table.
Pivotal Greenplum Database (GPDB) all versions