The document describes various options to remove bloat in database tables.
NOTE: analyze the table after any of the following activities to ensure the table statistics are up-to-date to allow the planner/optimizer to generate better plans.
To understand more about the table bloat please refer to the articles:
How to deal with Bloat in Heap tables in VMware Tanzu Greenplum
How to Determine the Size of the Table without Bloat to identify
Below are the various ways to remove the table bloat.
This can be slow, especially in version 5.x and below. In 5.x and below the VACUUM FULL command moves the data within the file. In 6.x and above the data is written without bloat to a new file and is much faster, but can still take a long time to run on large tables.
The command to execute a VACUUM FULL on the bloated table is:
VACUUM FULL <schema-name>.<table-name>;
NOTE: Any bloated catalog table in the database (that is all tables under pg_catalog schema) must use this method, thus removing highly bloated catalog tables can be very time-consuming.
NOTE: Running "VACUUM" rather than "VACUUM FULL" regularly enough will avoid the need to run "VACUUM FULL". A "VACUUM" is much faster than a "VACUUM FULL". How often to run "VACUUM" depends on the rate of change to the data in the table. See KB article for more details
When redistributing the data in a table the database writes the data to a new file and once done removes the old file.
Refer to the article How to Remove Fragmentation (Bloat) on a Relation Using Redistribute on Tanzu Greenplum for details on how to redistribute a table to remove bloat.
NOTE: It is recommended to use the VACUUM command in Greenplum 6.x and above rather than the redistribute method.
CTAS is another quick method to remove bloat and using this method helps to avoid the table lock (EXCLUSIVE LOCK) which the above two methods acquire to do the operation, thus allowing the table to be read by end users while maintenance is being performed on the main table. But to ensure data is not lost during the procedure, users should not be updating or inserting into the table.
The disadvantage is that it involves many steps to perform the activity. The steps involved are:
INSERT INTO <schema-name>.<new-table-name> SELECT * FROM <schema-name>.<bloated-table-name>; ALTER TABLE <schema-name>.<bloated-table-name> RENAME TO <old-table-name>; ALTER TABLE <schema-name>.<new-table-name> RENAME TO <bloated-table-name>; -- Once users confirm everything is good. DROP TABLE <schema-name>.<old-table-name>;
This involves backing up the table and then restoring the back. The different tools that can be used to achieve this are:
gpbackup/gprestore
pg_dump / pg_restore
COPY .. TO .. / COPY .. FROM ..
Refer to Managing Bloat in a Database.