The purpose of this document is to explain how to identify bloat within append optimized/append optimized column oriented (AO/AOCO) tables that require compaction.
Note: This document only applies to AO/AOCO tables in v4.3.x. For heap tables in v4.3.x, please refer to article. This article will also work on AO/AOCO tables in Greenplum 5.x and above.
Append-optimized tables were first introduced in GPDB v.4.3, adding the ability to UPDATE and DELETE data from Append Only tables.
To accommodate the ability to UPDATE and DELETE data, when an append-optimized table is created, an additional visibility bitmap (visimap) table is created with a B-tree index. This visimap table contains information on which rows are visible in the append optimized table, including details on the number of records on each segment, how many tuples or rows are visible, and how many are hidden (due to an update or delete).
Since the data is only marked "hidden" during the UPDATE or DELETE operations, these rows still take up disk space and the table will require regular maintenance to reclaim the space by compacting the append-optimized table.
The first step is to identify which tables could benefit from compaction. The gp_toolkit schema provides the function __gp_aovisimap_compaction_info(oid) to display the number of tuples in the on disk data files as compared to the visibility bitmap table and details whether compaction will assist in reclaiming space.
The function utilizes the server configuration parameter gp_appendonly_compaction_threshold to know the threshold in which to flag compaction_possible as true.
The default is 10 or 10%.
After identifying tables which could be compacted, a VACUUM can be run against them to reclaim this space.
The following demonstrates the steps involved in this process.
1. Query for the oid of the append-only table if the unknown is shown here:
gpadmin=# SELECT oid FROM pg_class WHERE relname='mytable'; oid ------- 47372 (1 row)
gpadmin=# SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(47372); NOTICE: gp_appendonly_compaction_threshold = 10 content | datafile | compaction_possible | hidden_tupcount | total_tupcount | percent_hidden ---------+----------+---------------------+-----------------+----------------+---------------- 2 | 1 | t | 33333 | 99999 | 33.33 (1 row)
gpadmin=# VACUUM mytable;
gpadmin=# SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(47372); NOTICE: gp_appendonly_compaction_threshold = 10 content | datafile | compaction_possible | hidden_tupcount | total_tupcount | percent_hidden ---------+----------+---------------------+-----------------+----------------+---------------- 2 | 1 | f | 0 | 66666 | 0.00
Environment: VMware Tanzu Greenplum 4.3.x