How to Determine the Size of the Table without Bloat
search cancel

How to Determine the Size of the Table without Bloat

book

Article ID: 295366

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article describes how to know the actual size of data of a given table without calculating the bloat.

 


Environment


Resolution

In the below query "Expected" refers to the size of the table (i.e if you don't consider bloat) and "Reclaimable or claimable" refers to the difference between current size minus expected size.

Query

-- To obtain the expected and reclaimable size of only those tables that has high bloat ( i.e it list only those tables that have the highest bloat on the database).

SELECT bdinspname||'.'||bdirelname "Relation name",
	bdirelpages::bigint "Curnt Pages",
        round((bdirelpages*33)/1024)::bigint "Curnt Size(MB)",
	bdiexppages::bigint "Exped Pages",
	round((bdiexppages*33)/1024)::bigint "Exped Size(MB)",
	(bdirelpages - bdiexppages)::bigint "Claimble Pages",
	round(((bdirelpages - bdiexppages)*33)/1024)::bigint "Claimble Size(MB)",
	bdidiag "Comments"
FROM gp_toolkit.gp_bloat_diag
ORDER BY 6;

-- To obtain the expected and reclaimable size of all the table in the database:

SELECT n.nspname||'.'||c.relname "Relation Name",
	btdrelpages::bigint "Curnt Pages",
        round((btdrelpages*33)/1024)::bigint "Curnt Size(MB)",
	btdexppages::bigint "Exped Pages",
	round((btdexppages*33)/1024)::bigint "Exped Size(MB)",
	(btdrelpages - btdexppages)::bigint "Claimble Pages",
	round(((btdrelpages - btdexppages)*33)/1024)::bigint "Claimble Size(MB)"
FROM gp_toolkit.gp_bloat_expected_pages e, pg_class c, pg_namespace n 
WHERE e.btdrelid=c.oid 
AND n.oid=c.relnamespace 
ORDER BY 6;

Note:

  1. The above values are approximate and depend on the stats of the table, analyze the tables to have a somewhat closer estimate.
  2. The value of 33 used above is after rounding of 32k (32*1024) and to avoid the integer buffer limit (ERROR: integer out of range) when the pages are too high.

How it works and calculated

Run the gp_toolkit.gp_bloat_diag to know the tables having bloat:

flightdata=# select * from gp_toolkit.gp_bloat_diag;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |              bdidiag
----------+------------+------------+-------------+-------------+------------------------------------
    92783 | public     | test2      |      618870 |       71503 | moderate amount of bloat suspected
(1 row)

Checking the values in the pg_class for confirmation:

flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname='test2';
 relname | relpages | reltuples
---------+----------+-----------
 test2   |   618870 |  71000000
(1 row)

So, based on the output from gp_bloat_diag the currently allocated OS pages is 618870 and the expected pages based on the data in the table should be 71503.

Since one OS page is equal to 32k ( i.e 32 * 1024 bytes ) in GPDB

The currently allocated size for the table is 618870 * 32k = 19 GB Approx

And the expected pages would be 71503 * 32k = 2G Approx

Taking it practically, the current size of the table is:

flightdata=# SELECT pg_size_pretty(pg_relation_size('test2'));
 pg_size_pretty
----------------
 19 GB
(1 row)

Remove bloat completely ( either by vacuum full or other options indicated in the article ), vacuum full is chosen here:

flightdata=# vacuum full test2;
VACUUM

And, then analyze:flightdata=# analyze test2; ANALYZE

flightdata=# analyze test2;
ANALYZE

The current size of the table is 2G which is approx to the calculation we did above:

flightdata=# SELECT pg_size_pretty(pg_relation_size('test2'));
 pg_size_pretty
----------------
 2441 MB
(1 row)

flightdata=# select relname,relpages,reltuples::bigint from pg_class where relname in ('test2');
 relname | relpages | reltuples
---------+----------+-----------
 test2   |    78108 |  71000168
(1 row)

Note- It is an approx value, but you will get an idea of how much you can reclaim after you redistribute / vacuum the bloated table.

 

Additional Information

+ Environment:

Pivotal Greenplum Database (GPDB) all versions