Determining tables in a query that requiring Analyze (gpmt gpstatscheck)
search cancel

Determining tables in a query that requiring Analyze (gpmt gpstatscheck)

book

Article ID: 295207

calendar_today

Updated On:

Products

VMware Tanzu Greenplum Greenplum Pivotal Data Suite Non Production Edition VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

The query is running slower than expected.

One issue can be outdated or invalid statistics on the tables involved in the query. This can happen if new data has been loaded into the table but an Analyze was never executed. The database will use wrong statistics when generating the query plan.

 


Resolution

The gpstatscheck utility in gpmt/gpsupport can be used to verify that all tables involved in a query have up-to-date statistics.

The options that are available with the utility.

[gpadmin@localhost ~]$ ./gpmt gpstatscheck --help

Usage:
gpmt gpstatscheck -f QUERYFILE
        [ -p PORT ] [ -d DATABASE ]
   

Description:
    
gpstatscheck idendtifies tables with outdated statistics.
    
It performs the following actions:
1. Accepts an input query file of the form: EXPLAIN VERBOSE <QUERY>.
2. ​Execute the input file and scan EXPLAIN VERBOSE plan to identify the scanned tables.
3. Execute a query to get count(*) and pg_class.reltuples for each table.
4. Check if difference between count and reltuples is above the defined threshold 5.0%.
5. Generate an SQL file in current directory containing the ANALYZE commands.

Options:
-f Query file
    -p Port (defaults to 5432)
    -d Database (defaults to gpadmin)
   

Example:
Execute query1.sql and check for missing stats
        gpmt gpstatscheck -f query1.sql
     

Example

  1. Create a file containing "EXPLAIN VERBOSE <problematic query>":
    $ cat /tmp/myquery.sql EXPLAIN VERBOSE SELECT * FROM mytable, nums, table1;
  2. Run gpstatscheck:
    $ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres
    Executing EXPLAIN VERBOSE query.
    Found 3 tables in query. 
    
    Table Details 
    -----------------------------------------
    | Table Name | Info |
    -----------------+-----------------------
    | public.nums | |
    | public.table1 | |
    | public.mytable | |
    -----------------+-----------------------
    
    Note: Views and External Tables will be skipped since they do not have statistics.
    Note: Top Level Partitions will be skipped since Legacy Query Optimizer does not use the statistics.
    
    OK to execute "SELECT count(*)" on tables listed above? Yy|Nn: y
    
    Executing count(*) to get actual tuple counts:
     -> public.nums ... done
     -> public.table1 ... done
     -> public.mytable ... done
    
     Stats Check Summary 
    -------------------------------------------------------------------------------------------
    | Table Name     | Actual           | Estimated        | Diff      | Comments             |
    -----------------+------------------+------------------+-----------+-----------------------
    | public.nums    |             1100 |              100 |      1000 | Needs ANALYZE        |
    | public.table1  |            10000 |            10000 |         0 | OK                   |
    | public.mytable |                0 |                0 |         0 | OK                   |
    -----------------+------------------+------------------+-----------+-----------------------
    
    Generating ANALYZE commands.
    
    Output file:
     gpstatscheck_20160926_134946.sql
    
    Execute using:
     psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql
    
    Execution finished successfully!
  3. If invalid statistics are detected the tool will generate an ANALYZE script shown above:
    $ cat gpstatscheck_20160926_134946.sql
    ANALYZE public.nums;
  4. Execute the ANALYZE script:
    psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql
  5. Now run gpstatscheck again and all the tables should show OK:
    $ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres
    -------------------------------------------------------------------------------------------
    | Table Name     | Actual           | Estimated        | Diff      | Comments             |
    -----------------+------------------+------------------+-----------+-----------------------
    | public.nums    |             1100 |             1100 |         0 | OK                   |
    | public.table1  |            10000 |            10000 |         0 | OK                   |
    | public.mytable |                0 |                0 |         0 | OK                   |
    -----------------+------------------+------------------+-----------+-----------------------

Additional Information

+ Environment:

Pivotal Greenplum Database (GPDB) all versions