How to identify and fix various components of a slow performing query using EXPLAIN and EXPLAIN ANALYZE in Tanzu Greenplun

search cancel

How to identify and fix various components of a slow performing query using EXPLAIN and EXPLAIN ANALYZE in Tanzu Greenplun

book

Article ID: 296707

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite

Issue/Introduction

When a query performs poorly, we have the EXPLAIN and EXPLAIN ANALYZE, which shows the query plan and the total elapsed time that it took to run the query.

This article covers how to read these query plans and fix issues seen with certain components within the execution.

Resolution

If a query performs poorly, examine the query plan and ask the following questions:

Do operations in the plan take an exceptionally long time?

Look for an operation that consumes the majority of query processing time. For example, if an index scan takes longer than expected, the index could be out-of-date and need to be reindexed.

Does the query planning time exceed query execution time?

When the query involves many table joins, the Postgres Planner uses a dynamic algorithm to plan the query that is in part based on the number of table joins. You can reduce the amount of time that the Postgres Planner spends planning the query by setting the join_collapse_limit and from_collapse_limit server configuration parameters to a smaller value, such as 8. Note that while smaller values reduce planning time, they may also yield inferior query plans.

Are the optimizer's estimates close to reality?

Run EXPLAIN ANALYZE and see if the number of rows the optimizer estimates is close to the number of rows the query operation actually returns. If there is a large discrepancy, collect more statistics on the relevant columns.

Are selective predicates applied early in the plan?

Apply the most selective filters early in the plan so fewer rows move up the plan tree. If the query plan does not correctly estimate query predicate selectivity, collect more statistics on the relevant columns. See the ANALYZE command in the Greenplum Database Reference Guide for more information collecting statistics. You can also try reordering the WHERE clause of your SQL statement.

Does the optimizer choose the best join order?

When you have a query that joins multiple tables, make sure that the optimizer chooses the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so fewer rows move up the plan tree. If the plan is not choosing the optimal join order, set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the Postgres Planner to the specified join order. You can also collect more statistics on the relevant join columns.

Does the optimizer selectively scan partitioned tables?

If you use table partitioning, is the optimizer selectively scanning only the child tables required to satisfy the query predicates? Scans of the parent tables should return 0 rows since the parent tables do not contain any data.

Does the optimizer choose hash aggregate and hash join operations where applicable?

Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. To enable the query optimizer to choose hash operations, there must be sufficient memory available to hold the estimated number of rows. Try increasing work memory to improve performance for a query. If possible, run an EXPLAIN ANALYZE for the query to show which plan operations spilled to disk, how much work memory they used, and how much memory was required to avoid spilling to disk.

Feedback

thumb_up Yes

thumb_down No