The query is running slower than expected.
One issue can be outdated or invalid statistics on the tables involved in the query. This can happen if new data has been loaded into the table but an Analyze was never executed. The database will use wrong statistics when generating the query plan.
The gpstatscheck utility in gpmt/gpsupport can be used to verify that all tables involved in a query have up-to-date statistics.
The options that are available with the utility.
[gpadmin@localhost ~]$ ./gpmt gpstatscheck --help Usage:
gpmt gpstatscheck -f QUERYFILE [ -p PORT ] [ -d DATABASE ]
Description: gpstatscheck idendtifies tables with outdated statistics. It performs the following actions: 1. Accepts an input query file of the form: EXPLAIN VERBOSE <QUERY>. 2. ​Execute the input file and scan EXPLAIN VERBOSE plan to identify the scanned tables. 3. Execute a query to get count(*) and pg_class.reltuples for each table. 4. Check if difference between count and reltuples is above the defined threshold 5.0%. 5. Generate an SQL file in current directory containing the ANALYZE commands. Options:
-f Query file -p Port (defaults to 5432) -d Database (defaults to gpadmin)
Example:
Execute query1.sql and check for missing stats gpmt gpstatscheck -f query1.sql
Example
$ cat /tmp/myquery.sql EXPLAIN VERBOSE SELECT * FROM mytable, nums, table1;
$ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres Executing EXPLAIN VERBOSE query. Found 3 tables in query. Table Details ----------------------------------------- | Table Name | Info | -----------------+----------------------- | public.nums | | | public.table1 | | | public.mytable | | -----------------+----------------------- Note: Views and External Tables will be skipped since they do not have statistics. Note: Top Level Partitions will be skipped since Legacy Query Optimizer does not use the statistics. OK to execute "SELECT count(*)" on tables listed above? Yy|Nn: y Executing count(*) to get actual tuple counts: -> public.nums ... done -> public.table1 ... done -> public.mytable ... done Stats Check Summary ------------------------------------------------------------------------------------------- | Table Name | Actual | Estimated | Diff | Comments | -----------------+------------------+------------------+-----------+----------------------- | public.nums | 1100 | 100 | 1000 | Needs ANALYZE | | public.table1 | 10000 | 10000 | 0 | OK | | public.mytable | 0 | 0 | 0 | OK | -----------------+------------------+------------------+-----------+----------------------- Generating ANALYZE commands. Output file: gpstatscheck_20160926_134946.sql Execute using: psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql Execution finished successfully!
$ cat gpstatscheck_20160926_134946.sql ANALYZE public.nums;
psql -p 5432 -d postgres -f gpstatscheck_20160926_134946.sql
$ gpmt gpstatscheck -f /tmp/myquery.sql -d postgres ------------------------------------------------------------------------------------------- | Table Name | Actual | Estimated | Diff | Comments | -----------------+------------------+------------------+-----------+----------------------- | public.nums | 1100 | 1100 | 0 | OK | | public.table1 | 10000 | 10000 | 0 | OK | | public.mytable | 0 | 0 | 0 | OK | -----------------+------------------+------------------+-----------+-----------------------
Pivotal Greenplum Database (GPDB) all versions