VACUUM and VACUUM FULL explained
search cancel

VACUUM and VACUUM FULL explained

book

Article ID: 296267

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

This article explains how VACUUM and VACUUM FULL works.

This article explains how VACUUM and VACUUM FULL works with an example.


Environment

Product Version: 4.3

Resolution

VACUUM

When performing VACUUM, the dead space is reclaimed and made available for re-use by the same object or table etc.
 

VACUUM ANALYZE

VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table.
 

VACUUM FULL

VACUUM FULL writes the entire content of the table into a new disk file and releases the wasted space back to OS. This causes a table-level lock on the table and slow speeds. VACUUM FULL should be avoided on a high load system.

With architecture changes in Greeplum version 6 VACUUM FULL can be run online for users tables. More details can be found in article Vacuum and Vacuum Full for VMware Tanzu Greenplum 6.  
 

How does VACUUM work?

When performing a VACUUM. the Greenplum (postgres) updates the free space map (FSM) and keep tracks of the free space. Subsequent updates on the relation will be tracked in the FSM through the internal table and used when needed. Issuing a Vacuum Verbose will produce the following:

INFO:  "vacuum_test": removed 1333496 row versions in 2127 pages  (seg17 sdw3:11039 pid=18570)
INFO:  "vacuum_test": removed 1333104 row versions in 2126 pages  (seg2 sdw1:11036 pid=10693)
INFO:  "vacuum_test": found 1333496 removable, 583324 nonremovable row versions in 3048 pages  (seg17 sdw3:11039 pid=18570)
DETAIL:  0 dead row versions cannot be removed yet.
.........
..........
INFO:  "vacuum_test": removed 1333212 row versions in 2123 pages  (seg1 sdw1:11035 pid=10687)
INFO:  "vacuum_test": found 1333212 removable, 583639 nonremovable row versions in 3048 pages  (seg1 sdw1:11035 pid=10687)
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
2124 pages contain useful free space.
0 pages are entirely empty.

Tune the parameter "max_fsm_relations" (default : 1000) and "max_fsm_pages" (default : 200000) to report on the master log when the FSM reaches a specific level.

Note: By design "max_fsm_pages" must be exceed 16 times "max_fsm_relations", or else the database will not start.
 

Example

Create a test table.

gpadmin=# create table vacuum_test ( a int , b text ) ;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
Time: 359.060 ms
gpadmin=#


The value of the tuples and pages on the table:

gpadmin=#
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages | reltuples
-------------+----------+-----------
 vacuum_test |        0 |         0
(1 row)
Time: 2.438 ms
gpadmin=#
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 25.589 ms
gpadmin=#


Insert the test data:

gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 6245.503 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4697.305 ms
gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 4319.884 ms
gpadmin=#
gpadmin=#


The current usage of the table (per tuples or pages).

gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    15912 | 1.00086e+07
(1 row)
Time: 2.585 ms
Analyzing the table to know the correct data
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1789.892 ms
gpadmin=# select relname , relpages , reltuples from pg_class where relname='vacuum_test';
   relname   | relpages |  reltuples
-------------+----------+-------------
 vacuum_test |    47710 | 3.00096e+07
(1 row)
Time: 2.687 ms
gpadmin=#


Clean up some rows.

gpadmin=# delete from vacuum_test  where a<800000;
DELETE 2399997
Time: 967.301 ms
                                                  ^
gpadmin=# delete from vacuum_test  where a<8000000;
DELETE 21600000
Time: 2184.398 ms


Performing an ANALYZE on the table:

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 26.318 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1881.765 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7763 | moderate amount of bloat suspected
(1 row)
Time: 19.336 ms

Performing VACUUM on the table.

gpadmin=# vacuum vacuum_test ;
VACUUM
Time: 132.314 ms
You will find that the vacuum has not released any space from the table , the table still uses the same OS blocks as done previously.

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 18.026 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.606 ms
gpadmin=#

Observe that the table below eports the same value as the table above. In addition, the "gp_bloat_diag" is reporting the table has bloat.

gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1176.675 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359758 | vacuum_test |    47710 | 5.80361e+06
(1 row)
Time: 2.638 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        7793 | moderate amount of bloat suspected
(1 row)
Time: 17.707 ms
gpadmin=#
gpadmin=#

Add data to the table that has had VACUUM performed on it.

gpadmin=# insert into vacuum_test values (generate_series(1,10000000),'this is a test');
INSERT 0 10000000
Time: 5171.964 ms

Witness the insert used up the freed space from the table and did not consume any extra pages from the OS.

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname  | bdirelpages | bdiexppages |              bdidiag
----------+------------+-------------+-------------+-------------+------------------------------------
   359758 | public     | vacuum_test |       47710 |        8057 | moderate amount of bloat suspected
(1 row)
Time: 16.548 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |    47710 |     6e+06
(1 row)
Time: 1.509 ms
gpadmin=# analyze vacuum_test ;
ANALYZE
Time: 1734.669 ms

gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 16.291 ms
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages |  reltuples
--------+-------------+----------+-------------
 359884 | vacuum_test |    47710 | 1.62695e+07
(1 row)
Time: 1.612 ms
gpadmin=#
Vacuum Full

When you do a vacuum full the data is now re-organized and the unused spaced are given back to the OS.

gpadmin=# vacuum full vacuum_test ;
NOTICE:  'VACUUM FULL' is not safe for large tables and has been known to yield unpredictable runtimes.
HINT:  Use 'VACUUM' instead.
VACUUM
Time: 1452.043 ms
gpadmin=# select * from gp_toolkit.gp_bloat_diag where bdirelid=359758;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------
(0 rows)
Time: 17.503 ms
 
gpadmin=# select oid,relname , relpages , reltuples from pg_class where relname='vacuum_test';
  oid   |   relname   | relpages | reltuples
--------+-------------+----------+-----------
 359758 | vacuum_test |     9552 |     6e+06
(1 row)
Time: 1.452 ms
gpadmin=#

Note: Running VACUUM FULL on a very large table can lead to an unexpected runtime and during this runtime the table will be on exclusive lock the whole time. A quicker approach may be to do one of the following: