Identify tables that need VACUUM ANALYZE
search cancel

Identify tables that need VACUUM ANALYZE

book

Article ID: 295850

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table. 

There are three different reasons to perform VACUUM ANALYZE on a table:

  • Reclaiming space from dead rows in order to prevent bloating (VACUUM).
  • Decrease transaction age for tables with high age (VACUUM).
  • Collect statistics so queries including this table have optimal execution plan (VACUUM).

This article describes the above reasons in detail.


Environment


Resolution

1. Reclaiming space from dead rows

There are a couple of views in the "gp_toolkit" administrative schema that shows the potential bloat in a table. These views compare the actual size of a table on the disk with the 'expected' size calculated from statistics. Therefore, it is necessary to ensure your tables ANALYZE before interpreting the results from these views.


gp_bloat_diag

 View "gp_toolkit.gp_bloat_diag"
   Column    |  Type   | Modifiers 
-------------+---------+-----------
 bdirelid    | oid     |              - OID
 bdinspname  | name    |              - Schema name
 bdirelname  | name    |              - Table name
 bdirelpages | integer |              - Number of table pages
 bdiexppages | numeric |              - Number of expected pages
 bdidiag     | text    |              - Diagnostic: "no bloat"/"moderate bloat"/"significant bloat"

Example

lpetrov=# select * from gp_toolkit.gp_bloat_diag;
 bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages |                bdidiag                
----------+------------+------------+-------------+-------------+---------------------------------------
   353016 | public     | t1         |         978 |           1 | significant amount of bloat suspected
(1 row)

gp_bloat_expected_pages

lpetrov=# \d gp_toolkit.gp_bloat_expected_pages 
View "gp_toolkit.gp_bloat_expected_pages"
   Column    |  Type   | Modifiers 
-------------+---------+-----------
 btdrelid    | oid     |             - OID
 btdrelpages | integer |             - Number of table pages
 btdexppages | numeric |             - Number of expected pages

Example

lpetrov=# select * from gp_toolkit.gp_bloat_expected_pages where btdrelid = 't1'::regclass;
 btdrelid | btdrelpages | btdexppages 
----------+-------------+-------------
   353016 |         978 |           1
(1 row)

gp_bloat_expected_pages shows all table data, while gp_bloat_diag will only show data about the tables with suspected moderate or significant bloat. 

2. Lower transaction age for tables with high age

To lower the transaction age for tables with high age, follow this two-step process:


a. Find databases or segments with high age.
SELECT 
    -1, datname, age(datfrozenxid) 
FROM pg_database
UNION ALL
SELECT 
    gp_segment_id, datname, age(datfrozenxid)
FROM gp_dist_random('pg_database')
ORDER BY 3 DESC
b. Find the tables within these databases and the segments that need to be  VACUUM'ed. 


GPDB 4.3.x

SELECT 
    coalesce(n.nspname, ''), 
    relname, 
    relkind, 
    relstorage, 
    age(relfrozenxid)
FROM 
    pg_class c 
    LEFT JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE 
    relkind = 'r' AND relstorage NOT IN ('x')
ORDER BY 5 DESC

GPDB 5.x

SELECT 
      coalesce(n.nspname, ''), 
      relname, 
      relkind, 
      relstorage, 
      age(relfrozenxid) 
FROM 
    pg_class c 
    LEFT JOIN 
    pg_namespace n ON c.relnamespace = n.oid 
WHERE 
     relkind = 'r' AND relstorage NOT IN ('x','a','c') 
ORDER BY 5 DESC

Note:

  • Ignore the following tables if they show up as having a high age. These are non-MVCC tables and will not affect the age of the database. 
pg_catalog.gp_persistent_tablespace_node                   
pg_catalog.gp_persistent_database_node  
pg_catalog.gp_global_sequence           
pg_catalog.gp_persistent_relation_node  
pg_catalog.gp_persistent_filespace_node 
  • You can use a WHERE condition instead of using LIMIT to show only ages greater than some constant.
  • There is a possibility of orphan system tables causing high age and will not get revealed using the above queries. In that scenario the clause relkind = 'r' needs to be modified with relkind = 't' . Pivotal support needs to be engaged for the fixes needed.

3. Collect statistics, so queries including this table have an optimal execution plan

There is no way to find out tables with stale statistics currently, aside from examining EXPLAIN ANALYZE output, and comparing the estimated rows versus scanned rows for table scans.

There are two ways to get an approximate value through:

  • gp_toolkit.gp_stats_missing - View this data to show tables without statistics in the catalog.
  • pg_stat_last_operation - View this data to find out when the last ANALYZE occurred. If significant modifications were done to the table in the meantime, it probably needs to be analyzed.