basic "select *" with "ORDER BY" clause runs for longer than expected - performance is highly sensitive to result-set row count

Products

VMware Tanzu Greenplum

Issue/Introduction

A basic "SELECT *" with "ORDER BY" clause runs for longer than expected - exceeds the Greenplum Hardware-based expectation for performance throughput according to the number of nodes and segments.

Cause

While a highly scaled Greenplum Database cluster should provide significantly positive impact on runtime performance, especially during execution of basic SQL, a potential common factor to the execution runtime is the rate of fetching the result-set.

Resolution

It is recommended to assess the impact of result-set data fetching through comparing the EXPLAIN ANALYZE output with the actual execution time from the client application standpoint.

Additional Information

The following example demonstrates various levels of performance impact based on controlling the row count in the resultset:

Without specifying any result-set control option ( All rows are returned by the primary and master segments )
During execution of EXPLAIN ANALYZE with "FETCH FIRST X ROW only" SQL clause ( Only one row is returned and only one row processed at segment level during Gather Motion )
During execution of EXPLAIN ANALYZE ( Only 1 row is returned )

testdb=# CREATE TABLE test AS SELECT generate_series(1, 1000000);
 NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
 SELECT 1000000
 testdb=#

The following execution time is result of running basic "SELECT *" with "ORDER BY" and fetching all 1-Million rows by the client application:

testdb=# \timing 
Timing is on.
testdb=# \o /dev/null
testdb=# select * from test order by 1;
Time: 9328.913 ms
testdb=#

The following execution time is result of eliminating both the result-set row fetching overhead as well as the overhead of the high row count during Gather Motion (actual time=2022.187..2022.187 rows=1)

testdb=# explain analyze select * from test order by 1 fetch first 1 row only;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..658.75 rows=1 width=4) (actual time=2022.237..2022.261 rows=1 loops=1)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..658.75 rows=1 width=4) (actual time=2022.187..2022.187 rows=1 loops=1)
         Merge Key: generate_series
         ->  Limit  (cost=0.00..658.75 rows=1 width=4) (actual time=2019.755..2019.803 rows=1 loops=1)
               ->  Sort  (cost=0.00..658.75 rows=500000 width=4) (actual time=2019.651..2019.651 rows=1 loops=1)
                     Sort Key: generate_series
                     Sort Method:  top-N heapsort  Memory: 18kB
                     ->  Seq Scan on test  (cost=0.00..440.35 rows=500000 width=4) (actual time=0.157..717.498 rows=500129 loops=1)
 Planning time: 56.815 ms
   (slice0)    Executor memory: 59K bytes.
 * (slice1)    Executor memory: 58K bytes avg x 2 workers, 58K bytes max (seg0).  Work_mem: 33K bytes max, 11723K bytes wanted.
 Memory used:  128000kB
 Memory wanted:  12122kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 2026.356 ms
(15 rows)

The following execution time is result of eliminating the overhead of fetching the SQL result-set with no limit. This is a direct impact of higher actual time in Gather Motion (actual time=2662.428..6631.003 rows=1000000)

testdb=# explain analyze select * from test order by 1;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..676.71 rows=1000000 width=4) (actual time=2662.428..6631.003 rows=1000000 loops=1)
   Merge Key: generate_series
   ->  Sort  (cost=0.00..658.75 rows=500000 width=4) (actual time=2642.417..3070.718 rows=500129 loops=1)
         Sort Key: generate_series
         Sort Method:  quicksort  Memory: 57330kB
         ->  Seq Scan on test  (cost=0.00..440.35 rows=500000 width=4) (actual time=0.134..581.057 rows=500129 loops=1)
 Planning time: 31.944 ms
   (slice0)    Executor memory: 183K bytes.
   (slice1)    Executor memory: 28706K bytes avg x 2 workers, 28706K bytes max (seg0).  Work_mem: 28665K bytes max.
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 6940.240 ms
(12 rows)