How gp_stats_missing works
search cancel

How gp_stats_missing works

book

Article ID: 296249

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article explains how the view gp_stats_missing works with an example.


Environment

Product Version: 4.2

Resolution

gp_stats_missing, located under the gp_toolkit schema, is a tool built by Pivotal Greenplum (GPDB). By definition in the admin guide, it is a tool to check the tables that do not have statistics and therefore may require an ANALYZE be run on the table. However, this is NOT the case.

This article examines two cases to demonstrate that is not needed to run ANALYZE.

Case 1

The current value in the view, gp_stats_missing:
gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
Create a table:
gpdb=# create table test ( a int );
    NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Greenplum Database data distribution key for this table.
    HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
    CREATE TABLE
    Time: 11.534 ms
Now gp_stats_missing shows the following:
gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

This explains the documentation for gp_stats_missing that explains it is a view that explains or tells which table doesn't have statistics.

Now, let's ANALYZE the table:
 gpdb=# analyze test;
    ANALYZE
    Time: 37.658 ms
Observe that ANALYZE did not clear up the entry.
 gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)

The reason the entry is not cleared up is in the gp_toolkit.gp_stats_missing view definition. If you check the definition of the view, it states it is checking if the table created has value zero for either relpages or reltuples in the pg_class table. This is the reason you will find the smisize column under the gp_stats_missing view with value false "f" stating either the relpages or reltuples has a value of zero.

 SELECT aut.autnspname AS smischema, aut.autrelname AS smitable,
        CASE
            WHEN aut.autrelpages = 0 OR aut.autreltuples = 0::double precision THEN false
            ELSE true
        END AS smisize, attrs.attcnt AS smicols, COALESCE(bar.stacnt, 0::bigint) AS smirecs
    FROM gp_toolkit.__gp_user_tables aut
    JOIN ( SELECT pg_attribute.attrelid, count(*) AS attcnt
           FROM pg_attribute
          WHERE pg_attribute.attnum > 0
          GROUP BY pg_attribute.attrelid) attrs ON aut.autoid = attrs.attrelid
    LEFT JOIN ( SELECT pg_statistic.starelid, count(*) AS stacnt
      FROM pg_statistic
     GROUP BY pg_statistic.starelid) bar ON aut.autoid = bar.starelid
    WHERE (aut.autrelkind = 'r'::"char" AND (aut.autrelpages = 0 OR aut.autreltuples = 0::double precision)) OR (bar.stacnt IS NOT NULL AND attrs.attcnt > bar.stacnt);

Therefore, when checking the pg_class entry, you will find that the tables have a value of zero for reltuples and relpages:

gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
    oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
    309901 | test    |         0 |        0
    (1 row)
 
Time: 1.158 ms
Let's insert some data into the table:
gpdb=# insert into test values ( generate_series ( 1, 10000));
    INSERT 0 10000
    Time: 22.305 ms
    The table ‘pg_class’ now shows an update to relpages even though we have not analyzed: 
    gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |         0 |        1
    (1 row)
     
    Time: 1.158 ms
This is due to gp_autostats_mode. This analyzes the zero records table for the first time when the data is first inserted.
gpdb=# show  gp_autostats_mode;
     gp_autostats_mode
    -------------------
     ON_NO_STATS
    (1 row)
     
    Time: 0.260 ms
    gpdb=#
However, it did not remove the data from gp_toolkit.gp_stats_missing:
 gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | f       |       1 |       0
    (1 row)
     
    Time: 15.469 ms

Again, the reason for this lies in the definition of the gp_stats_missing view "aut.autrelpages = 0 OR aut.autreltuples = 0::double precision" which states both the relpages and reltuples should have an entry.

Now run a ANALYZE  on the data inserted:

 gpdb=# analyze test;
    ANALYZE
    Time: 101.794 ms
Cross verify in the pg_class:
gpdb=# select oid,relname,reltuples,relpages from pg_class where relname='test';
      oid   | relname | reltuples | relpages
    --------+---------+-----------+----------
     309901 | test    |     10000 |       16
    (1 row)

Observe that it is now working properly. 

gpdb=# select * from gp_toolkit.gp_stats_missing;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 20.075 ms
    gpdb=#

Case 2

Another case where the gp_stat_missing shows values for the table is when you alter the table by dropping or adding columns. For example:

Add a new column:
alter table test ADD column (b int);
gp_stats_missing shows:
gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
     public    | test     | t       |       2 |       1
    (1 row)
     
    Time: 14.265 ms
    Where :  
    smisize = "t" meaning relpages or reltuples are not zero , "f" means either relpages or reltuples are zero
    smicols = there are two columns in the table "test"
    smirecs = only one of the column in the table "test" has statistics information in the pg_statistic table.
ANALYZE the table:
 gpdb=# analyze test;
    ANALYZE
    Time: 35.347 ms
    gpdb=#
gp_stat_missing now shows:
gpdb=# select * from gp_toolkit.gp_stats_missing ;
     smischema | smitable | smisize | smicols | smirecs
    -----------+----------+---------+---------+---------
    (0 rows)
     
    Time: 15.989 ms

Additional Information

In summary, the follow is true about gp_stats_missing:
  • gp_stats_missing tells you which table needs statistics or ANALYZE if the table with data has not been analyzed anytime in its lifecycle or when the table has altered to add or drop the column.
  • gp_stats_missing doesn't remove the entry from its view when the table has no rows or when tuples and pages have been reduced to zero after deleting data .
  • gp_stats_missing does not tell you if the table needs statistics after updates, deletions, or bulk loadings of data, since it only checks for whether relpages and reltuples = 0.