Query scans all leaf partitions of a table if a WHERE condition contains more than 100 elements.
search cancel

Query scans all leaf partitions of a table if a WHERE condition contains more than 100 elements.

book

Article ID: 297039

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Query scans all leaf partitions of a table if a WHERE condition contains more than 100 elements, that is there is no partition elimination.
For example:
SELECT * FROM partition_table_01 WHERE column_a in (1,2,3,4,....) -- specify more than 100 different values in the WHERE clause

If there are 100 or less values in the WHERE clause, the partition elimination wil occur and may result in only some of the partitions being scanned.

The following steps will show what happens:

[1] Create Database, table and inset sample data into the table.

$ vi 01_create_table.sql
CREATE TABLE pa_test (
    pdate text NOT NULL,
    data1 text NOT NULL
)
DISTRIBUTED randomly
PARTITION BY RANGE(pdate)
(
    PARTITION p2019 START ('20190101'::text) END ('20200101'::text),
    PARTITION p2020 START ('20200101'::text) END ('20210101'::text),
    PARTITION p2021 START ('20210101'::text) END ('20220101'::text),
    PARTITION p2022 START ('20220101'::text) END ('20230101'::text),
    DEFAULT PARTITION pother
);

$ vi 02_data_insert.sql
SELECT 'insert into pa_test values ('''||RES.WK_DT||''',''a'');'
FROM (SELECT TO_CHAR(GENERATE_SERIES('20190101'::TIMESTAMP, '20220701'::TIMESTAMP, '1 Day'), 'YYYYMMDD') AS WK_DT) RES
ORDER BY RES.WK_DT;

analyze pa_test;

$ createdb testdb
$ psql -d testdb -f ./01_create_table.sql
$ psql -d testdb -f ./02_data_insert.sql

[2] Run select query to see count of record.

$ vi 03_select.sql
SELECT ','''||RES.WK_DT||''''
FROM (SELECT TO_CHAR(GENERATE_SERIES('20220422'::TIMESTAMP, '20220731'::TIMESTAMP, '1 Day'), 'YYYYMMDD') AS WK_DT) RES
ORDER BY RES.WK_DT;

$ psql -d testdb -f ./03_select.sql

[3] Run explain select query to see how query scan partition tables

$ psql -d testdb -f ./04_compar_partition_scan.sql
~~ snip
 Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..431.00 rows=1 width=16)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=16)
         ->  Partition Selector for pa_test (dynamic scan id: 1)  (cost=10.00..100.00 rows=17 width=4)
               Partitions selected: 1 (out of 5) --> Look at this part
         ->  Dynamic Seq Scan on pa_test (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=16)
               Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,202205
08,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,20220521,20220522,20220523,20220524,20220525,20220526,202205
~~ snip
Time: 277.829 ms
SET
Time: 2.453 ms
~~ snip
 Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..431.00 rows=1 width=16)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=16)
         ->  Partition Selector for pa_test (dynamic scan id: 1)  (cost=10.00..100.00 rows=17 width=4)
               Partitions selected: 5 (out of 5)  --> Look at this part
         ->  Dynamic Seq Scan on pa_test (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=16)
~~ snip

When there are more than 100 values in the WHERE clause the GPorca planner will scan all partitions and can result in very high load and take a long time to complete.
 


Environment

Product Version: 6.21
OS: RHEL or CentOS 7

Resolution

In order to solve it as workaround there should be limited with 100 or fewer values in the WHERE clause or use range condition.

This issue is expected default behaviour.
The GUC optimizer_array_expansion_threshold, which defaults to 100, controls the behaviour.
Increasing the value will allow partition elimination to take place with a greater number of values in the WHERE clause.

NOTE: Changing the value of this parameter changes the trade-off between a shorter optimization time and lower memory consumption, and the potential benefits from constraint derivation during query optimization, for example conflict detection and partition elimination.

Continuing the example above, we see that partition elimination is achieved in both queries when the GUC optimizer_array_expansion_threshold is increased:

$ psql -c "show optimizer_array_expansion_threshold;" testdb; 
-------------------------------------
 1000   # ~~~~~----> see here
(1 row)
SET

$ psql -d testdb -f ./04_compar_partition_scan.sql optimizer_array_expansion_threshold
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=16)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=16)
         ->  Partition Selector for pa_test (dynamic scan id: 1)  (cost=10.00..100.00 rows=50 width=4)
               Partitions selected: 1 (out of 5) ~~~~~-> see here
         ->  Dynamic Seq Scan on pa_test (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=16)
               Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,20220508,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,202
20521,20220522,20220523,20220524,20220525,20220526,20220527,20220528,20220529,20220530,20220531,20220601,20220602,20220603,20220604,20220605,20220606,20220607,20220608,20220609,20220610,20220611,20220612,20220613,20220614,20220615,20220616,20220617,20220618,20220619,20220620,20220621,
20220622,20220623,20220624,20220625,20220626,20220627,20220628,20220629,20220630,20220701,20220702,20220703,20220704,20220705,20220706,20220707,20220708,20220709,20220710,20220711,20220712,20220713,20220714,20220715,20220716,20220717,20220718,20220719,20220720,20220721,20220722,202207
23,20220724,20220725,20220726,20220727,20220728,20220729,20220730,20220731}'::text[]))
 Optimizer: Pivotal Optimizer (GPORCA)
(7 rows)
SET

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=16)
   ->  Sequence  (cost=0.00..431.00 rows=1 width=16)
         ->  Partition Selector for pa_test (dynamic scan id: 1)  (cost=10.00..100.00 rows=50 width=4)
               Partitions selected: 1 (out of 5) ~~~~~-> see here
         ->  Dynamic Seq Scan on pa_test (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=16)
               Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,20220508,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,202
20521,20220522,20220523,20220524,20220525,20220526,20220527,20220528,20220529,20220530,20220531,20220601,20220602,20220603,20220604,20220605,20220606,20220607,20220608,20220609,20220610,20220611,20220612,20220613,20220614,20220615,20220616,20220617,20220618,20220619,20220620,20220621,
20220622,20220623,20220624,20220625,20220626,20220627,20220628,20220629,20220630,20220701,20220702,20220703,20220704,20220705,20220706,20220707,20220708,20220709,20220710,20220711,20220712,20220713,20220714,20220715,20220716,20220717,20220718,20220719,20220720,20220721,20220722,202207
23,20220724,20220725,20220726,20220727,20220728,20220729,20220730,20220731,20220801,20220802}'::text[]))
 Optimizer: Pivotal Optimizer (GPORCA)
(7 rows)

Please refer the following technical explanation about optimizer_array_expansion_threshold described at the link[1] in order to understand why this problem occurs.

optimizer_array_expansion_threshold
When GPORCA is enabled (the default) and is processing a query that contains a predicate with
a constant array, the optimizer_array_expansion_threshold parameter limits the optimization
process based on the number of constants in the array. If the array in the query predicate
contains more than the number elements specified by parameter, GPORCA deactivates the transformation
of the predicate into its disjunctive normal form during query optimization.
 
The default value is 100.
 
For example, when GPORCA is running a query that contains an IN clause with more than 100 elements,
GPORCA does not transform the predicate into its disjunctive normal form during query optimization
to reduce optimization time consume less memory. The difference in query processing can be
seen in the filter condition for the IN clause of the query EXPLAIN plan.
 
Changing the value of this parameter changes the trade-off between a shorter optimization time
and lower memory consumption, and the potential benefits from constraint derivation during
query optimization, for example conflict detection and partition elimination.
 
The parameter can be set for a database system, an individual database, or a session or query.
See optimizer_array_expansion_threshold for full details.