Detecting Bloat in GPDB
search cancel

Detecting Bloat in GPDB

book

Article ID: 379035

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum Pivotal Data Suite Non Production Edition

Issue/Introduction

In Greenplum Database (GPDB) version 6.x the gp_toolkit.gp_bloat_diag function is intended to help identify bloated tables. However, there are circumstances where bloat may not be detected even when it exists.

This article details the issue, the underlying causes, potential resolutions, and additional considerations.

Environment

Greenplum Database 6.x

Cause

The gp_toolkit.gp_bloat_diag User Defined Function (UDF) identifies bloat based on page counts. It only reports a heap table as bloated if the actual number of pages is greater than three times the expected number of pages. Consequently, if a table does not meet this threshold, it will not appear in the results.


The current function definition is configured as below to detect “moderate” bloat -

CREATE FUNCTION gp_toolkit.gp_bloat_diag(btdrelpages int, btdexppages numeric, aotable bool,
    OUT bltidx int, OUT bltdiag text)
AS
$$
    SELECT
        bloatidx,
        CASE
            WHEN bloatidx = 0
                THEN 'no bloat detected'::text
            WHEN bloatidx = 1
                THEN 'moderate amount of bloat suspected'::text
            WHEN bloatidx = 2
                THEN 'significant amount of bloat suspected'::text
            WHEN bloatidx = -1
                THEN 'diagnosis inconclusive or no bloat suspected'::text
        END AS bloatdiag
    FROM
    (
        SELECT
            CASE
                WHEN $3 = 't' THEN 0
                WHEN $1 < 10 AND $2 = 0 THEN -1
                WHEN $2 = 0 THEN 2
                WHEN $1 < $2 THEN 0
                WHEN ($1/$2)::numeric > 10 THEN 2
                WHEN ($1/$2)::numeric > 3 THEN 1 
                ELSE -1     -------------- PROBLEM is here (for us $1/$2 is ~2)
            END AS bloatidx
    ) AS bloatmapping

$$
LANGUAGE SQL READS SQL DATA;

Resolution

It is crucial to maintain up-to-date statistics on tables. Regularly performing the ANALYZE command on each table can refresh these statistics:

ANALYZE table_name;

 

Even after performing above operation if the bloat is undetected, perform the below workaround:

It is possible to create a new function that defines bloat using different criteria.

 

-- Define a new function gp_bloat_diag_strict
CREATE FUNCTION gp_toolkit.gp_bloat_diag_strict(btdrelpages int, btdexppages numeric, aotable bool,
    OUT bltidx int, OUT bltdiag text)
AS
$$
    SELECT
        bloatidx,
        CASE
            WHEN bloatidx = 0
                THEN 'no bloat detected'::text
            WHEN bloatidx = 1
                THEN 'moderate amount of bloat suspected'::text
            WHEN bloatidx = 2
                THEN 'significant amount of bloat suspected'::text
            WHEN bloatidx = -1
                THEN 'diagnosis inconclusive or no bloat suspected'::text
        END AS bloatdiag
    FROM
    (
        SELECT
            CASE
                WHEN $3 = 't' THEN 0
                WHEN $1 < 10 AND $2 = 0 THEN -1
                WHEN $2 = 0 THEN 2
                WHEN $1 < $2 THEN 0
                WHEN ($1/$2)::numeric > 10 THEN 2
                WHEN ($1/$2)::numeric > 1.5 THEN 1  --- It is possible to alter this ratio as required.
                ELSE -1
            END AS bloatidx
    ) AS bloatmapping

$$
LANGUAGE SQL READS SQL DATA;

-- And new view gp_bloat_diag_strict:
CREATE VIEW gp_toolkit.gp_bloat_diag_strict
AS
    SELECT
        btdrelid AS bdirelid,
        fnnspname AS bdinspname,
        fnrelname AS bdirelname,
        btdrelpages AS bdirelpages,
        btdexppages AS bdiexppages,
        bltdiag(bd) AS bdidiag
    FROM
    (
        SELECT
            fn.*, beg.*,
            gp_toolkit.gp_bloat_diag_strict(btdrelpages::int, btdexppages::numeric, iao.iaotype::bool) AS bd
        FROM
            gp_toolkit.gp_bloat_expected_pages beg,
            pg_catalog.pg_class pgc,
            gp_toolkit.__gp_fullname fn,
            gp_toolkit.__gp_is_append_only iao

        WHERE beg.btdrelid = pgc.oid
            AND pgc.oid = fn.fnoid
            AND iao.iaooid = pgc.oid
    ) as bloatsummary
    WHERE bltidx(bd) > 0;

 

Use the above view and if needed change the definition of what qualifies as “moderate” bloat. Adjust the coefficients in the CASE statement inside the function (In above function we have defined it as 1.5 times)