This article explains how VACUUM and VACUUM FULL works.
This article explains how VACUUM and VACUUM FULL works with an example.
When performing VACUUM, the dead space is reclaimed and made available for re-use by the same object or table etc.
VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table.
VACUUM FULL writes the entire content of the table into a new disk file and releases the wasted space back to OS. This causes a table-level lock on the table and slow speeds. VACUUM FULL should be avoided on a high load system.
With architecture changes in Greeplum version 6 VACUUM FULL can be run online for users tables. More details can be found in article Vacuum and Vacuum Full for VMware Tanzu Greenplum 6.
When performing a VACUUM. the Greenplum (postgres) updates the free space map (FSM) and keep tracks of the free space. Subsequent updates on the relation will be tracked in the FSM through the internal table and used when needed. Issuing a Vacuum Verbose will produce the following:
INFO: "vacuum_test": removed 1333496 row versions in 2127 pages (seg17 sdw3:11039 pid=18570) INFO: "vacuum_test": removed 1333104 row versions in 2126 pages (seg2 sdw1:11036 pid=10693) INFO: "vacuum_test": found 1333496 removable, 583324 nonremovable row versions in 3048 pages (seg17 sdw3:11039 pid=18570) DETAIL: 0 dead row versions cannot be removed yet. ......... .......... INFO: "vacuum_test": removed 1333212 row versions in 2123 pages (seg1 sdw1:11035 pid=10687) INFO: "vacuum_test": found 1333212 removable, 583639 nonremovable row versions in 3048 pages (seg1 sdw1:11035 pid=10687) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 2124 pages contain useful free space. 0 pages are entirely empty.
Tune the parameter "max_fsm_relations" (default : 1000) and "max_fsm_pages" (default : 200000) to report on the master log when the FSM reaches a specific level.
Note: By design "max_fsm_pages" must be exceed 16 times "max_fsm_relations", or else the database will not start.
Create a test table.
gpadmin=# create table vacuum_test ( a int , b text ) ; NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE Time: 359.060 ms gpadmin=#
The value of the tuples and pages on the table:
gpadmin=# gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test'; relname | relpages | reltuples -------------+----------+----------- vacuum_test | 0 | 0 (1 row) Time: 2.438 ms gpadmin=# gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------- (0 rows) Time: 25.589 ms gpadmin=#
Insert the test data:
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 Time: 6245.503 ms gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 Time: 4697.305 ms gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 Time: 4319.884 ms gpadmin=# gpadmin=#
The current usage of the table (per tuples or pages).
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test'; relname | relpages | reltuples -------------+----------+------------- vacuum_test | 15912 | 1.00086e+07 (1 row) Time: 2.585 ms Analyzing the table to know the correct data gpadmin=# analyze vacuum_test ; ANALYZE Time: 1789.892 ms gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test'; relname | relpages | reltuples -------------+----------+------------- vacuum_test | 47710 | 3.00096e+07 (1 row) Time: 2.687 ms gpadmin=#
Clean up some rows.
gpadmin=# delete from vacuum_test where a<800000; DELETE 2399997 Time: 967.301 ms ^ gpadmin=# delete from vacuum_test where a<8000000; DELETE 21600000 Time: 2184.398 ms
Performing an ANALYZE on the table:
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------- (0 rows) Time: 26.318 ms gpadmin=# analyze vacuum_test ; ANALYZE Time: 1881.765 ms gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+-------------+-------------+-------------+------------------------------------ 359758 | public | vacuum_test | 47710 | 7763 | moderate amount of bloat suspected (1 row) Time: 19.336 ms
Performing VACUUM on the table.
gpadmin=# vacuum vacuum_test ; VACUUM Time: 132.314 ms You will find that the vacuum has not released any space from the table , the table still uses the same OS blocks as done previously. gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+-------------+-------------+-------------+------------------------------------ 359758 | public | vacuum_test | 47710 | 8057 | moderate amount of bloat suspected (1 row) Time: 18.026 ms gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test'; oid | relname | relpages | reltuples --------+-------------+----------+----------- 359758 | vacuum_test | 47710 | 6e+06 (1 row) Time: 1.606 ms gpadmin=#
Observe that the table below eports the same value as the table above. In addition, the "gp_bloat_diag" is reporting the table has bloat.
gpadmin=# analyze vacuum_test ; ANALYZE Time: 1176.675 ms gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test'; oid | relname | relpages | reltuples --------+-------------+----------+------------- 359758 | vacuum_test | 47710 | 5.80361e+06 (1 row) Time: 2.638 ms gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+-------------+-------------+-------------+------------------------------------ 359758 | public | vacuum_test | 47710 | 7793 | moderate amount of bloat suspected (1 row) Time: 17.707 ms gpadmin=# gpadmin=#
Add data to the table that has had VACUUM performed on it.
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 Time: 5171.964 ms
Witness the insert used up the freed space from the table and did not consume any extra pages from the OS.
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+-------------+-------------+-------------+------------------------------------ 359758 | public | vacuum_test | 47710 | 8057 | moderate amount of bloat suspected (1 row) Time: 16.548 ms gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test'; oid | relname | relpages | reltuples --------+-------------+----------+----------- 359758 | vacuum_test | 47710 | 6e+06 (1 row) Time: 1.509 ms gpadmin=# analyze vacuum_test ; ANALYZE Time: 1734.669 ms gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------- (0 rows) Time: 16.291 ms gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test'; oid | relname | relpages | reltuples --------+-------------+----------+------------- 359884 | vacuum_test | 47710 | 1.62695e+07 (1 row) Time: 1.612 ms gpadmin=# Vacuum Full When you do a vacuum full the data is now re-organized and the unused spaced are given back to the OS. gpadmin=# vacuum full vacuum_test ; NOTICE: 'VACUUM FULL' is not safe for large tables and has been known to yield unpredictable runtimes. HINT: Use 'VACUUM' instead. VACUUM Time: 1452.043 ms gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758; bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag ----------+------------+------------+-------------+-------------+--------- (0 rows) Time: 17.503 ms gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test'; oid | relname | relpages | reltuples --------+-------------+----------+----------- 359758 | vacuum_test | 9552 | 6e+06 (1 row) Time: 1.452 ms gpadmin=#
Note: Running VACUUM FULL on a very large table can lead to an unexpected runtime and during this runtime the table will be on exclusive lock the whole time. A quicker approach may be to do one of the following: