In Greenplum Database (GPDB) version 6.x the gp_toolkit.gp_bloat_diag
function is intended to help identify bloated tables. However, there are circumstances where bloat may not be detected even when it exists.
This article details the issue, the underlying causes, potential resolutions, and additional considerations.
Greenplum Database 6.x
The gp_toolkit.gp_bloat_diag User Defined Function (UDF) identifies bloat based on page counts. It only reports a heap table as bloated if the actual number of pages is greater than three times the expected number of pages. Consequently, if a table does not meet this threshold, it will not appear in the results.
The current function definition is configured as below to detect “moderate” bloat -
CREATE FUNCTION gp_toolkit.gp_bloat_diag(btdrelpages int, btdexppages numeric, aotable bool,
OUT bltidx int, OUT bltdiag text)
AS
$$
SELECT
bloatidx,
CASE
WHEN bloatidx = 0
THEN 'no bloat detected'::text
WHEN bloatidx = 1
THEN 'moderate amount of bloat suspected'::text
WHEN bloatidx = 2
THEN 'significant amount of bloat suspected'::text
WHEN bloatidx = -1
THEN 'diagnosis inconclusive or no bloat suspected'::text
END AS bloatdiag
FROM
(
SELECT
CASE
WHEN $3 = 't' THEN 0
WHEN $1 < 10 AND $2 = 0 THEN -1
WHEN $2 = 0 THEN 2
WHEN $1 < $2 THEN 0
WHEN ($1/$2)::numeric > 10 THEN 2
WHEN ($1/$2)::numeric > 3 THEN 1
ELSE -1 -------------- PROBLEM is here (for us $1/$2 is ~2)
END AS bloatidx
) AS bloatmapping
$$
LANGUAGE SQL READS SQL DATA;
It is crucial to maintain up-to-date statistics on tables. Regularly performing the ANALYZE
command on each table can refresh these statistics:
ANALYZE table_name;
Even after performing above operation if the bloat is undetected, perform the below workaround:
It is possible to create a new function that defines bloat using different criteria.
-- Define a new function gp_bloat_diag_strict
CREATE FUNCTION gp_toolkit.gp_bloat_diag_strict(btdrelpages int, btdexppages numeric, aotable bool,
OUT bltidx int, OUT bltdiag text)
AS
$$
SELECT
bloatidx,
CASE
WHEN bloatidx = 0
THEN 'no bloat detected'::text
WHEN bloatidx = 1
THEN 'moderate amount of bloat suspected'::text
WHEN bloatidx = 2
THEN 'significant amount of bloat suspected'::text
WHEN bloatidx = -1
THEN 'diagnosis inconclusive or no bloat suspected'::text
END AS bloatdiag
FROM
(
SELECT
CASE
WHEN $3 = 't' THEN 0
WHEN $1 < 10 AND $2 = 0 THEN -1
WHEN $2 = 0 THEN 2
WHEN $1 < $2 THEN 0
WHEN ($1/$2)::numeric > 10 THEN 2
WHEN ($1/$2)::numeric > 1.5 THEN 1 --- It is possible to alter this ratio as required.
ELSE -1
END AS bloatidx
) AS bloatmapping
$$
LANGUAGE SQL READS SQL DATA;
-- And new view gp_bloat_diag_strict:
CREATE VIEW gp_toolkit.gp_bloat_diag_strict
AS
SELECT
btdrelid AS bdirelid,
fnnspname AS bdinspname,
fnrelname AS bdirelname,
btdrelpages AS bdirelpages,
btdexppages AS bdiexppages,
bltdiag(bd) AS bdidiag
FROM
(
SELECT
fn.*, beg.*,
gp_toolkit.gp_bloat_diag_strict(btdrelpages::int, btdexppages::numeric, iao.iaotype::bool) AS bd
FROM
gp_toolkit.gp_bloat_expected_pages beg,
pg_catalog.pg_class pgc,
gp_toolkit.__gp_fullname fn,
gp_toolkit.__gp_is_append_only iao
WHERE beg.btdrelid = pgc.oid
AND pgc.oid = fn.fnoid
AND iao.iaooid = pgc.oid
) as bloatsummary
WHERE bltidx(bd) > 0;
Use the above view and if needed change the definition of what qualifies as “moderate” bloat. Adjust the coefficients in the CASE statement inside the function (In above function we have defined it as 1.5 times)