Query scans all leaf partitions of a table if a WHERE condition contains more than 100 elements, that is there is no partition elimination.
For example:
SELECT * FROM partition_table_01 WHERE column_a in (1,2,3,4,....) -- specify more than 100 different values in the WHERE clause
If there are 100 or less values in the WHERE clause, the partition elimination wil occur and may result in only some of the partitions being scanned.
The following steps will show what happens:
$ vi 01_create_table.sql CREATE TABLE pa_test ( pdate text NOT NULL, data1 text NOT NULL ) DISTRIBUTED randomly PARTITION BY RANGE(pdate) ( PARTITION p2019 START ('20190101'::text) END ('20200101'::text), PARTITION p2020 START ('20200101'::text) END ('20210101'::text), PARTITION p2021 START ('20210101'::text) END ('20220101'::text), PARTITION p2022 START ('20220101'::text) END ('20230101'::text), DEFAULT PARTITION pother ); $ vi 02_data_insert.sql SELECT 'insert into pa_test values ('''||RES.WK_DT||''',''a'');' FROM (SELECT TO_CHAR(GENERATE_SERIES('20190101'::TIMESTAMP, '20220701'::TIMESTAMP, '1 Day'), 'YYYYMMDD') AS WK_DT) RES ORDER BY RES.WK_DT; analyze pa_test; $ createdb testdb $ psql -d testdb -f ./01_create_table.sql $ psql -d testdb -f ./02_data_insert.sql
$ vi 03_select.sql SELECT ','''||RES.WK_DT||'''' FROM (SELECT TO_CHAR(GENERATE_SERIES('20220422'::TIMESTAMP, '20220731'::TIMESTAMP, '1 Day'), 'YYYYMMDD') AS WK_DT) RES ORDER BY RES.WK_DT; $ psql -d testdb -f ./03_select.sql
$ psql -d testdb -f ./04_compar_partition_scan.sql ~~ snip Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..431.00 rows=1 width=16) -> Sequence (cost=0.00..431.00 rows=1 width=16) -> Partition Selector for pa_test (dynamic scan id: 1) (cost=10.00..100.00 rows=17 width=4) Partitions selected: 1 (out of 5) --> Look at this part -> Dynamic Seq Scan on pa_test (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=16) Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,202205 08,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,20220521,20220522,20220523,20220524,20220525,20220526,202205 ~~ snip Time: 277.829 ms SET Time: 2.453 ms ~~ snip Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..431.00 rows=1 width=16) -> Sequence (cost=0.00..431.00 rows=1 width=16) -> Partition Selector for pa_test (dynamic scan id: 1) (cost=10.00..100.00 rows=17 width=4) Partitions selected: 5 (out of 5) --> Look at this part -> Dynamic Seq Scan on pa_test (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=16) ~~ snip
When there are more than 100 values in the WHERE clause the GPorca planner will scan all partitions and can result in very high load and take a long time to complete.
In order to solve it as workaround there should be limited with 100 or fewer values in the WHERE clause or use range condition.
This issue is expected default behaviour.
The GUC optimizer_array_expansion_threshold, which defaults to 100, controls the behaviour.
Increasing the value will allow partition elimination to take place with a greater number of values in the WHERE clause.
NOTE: Changing the value of this parameter changes the trade-off between a shorter optimization time and lower memory consumption, and the potential benefits from constraint derivation during query optimization, for example conflict detection and partition elimination.
Continuing the example above, we see that partition elimination is achieved in both queries when the GUC optimizer_array_expansion_threshold is increased:
$ psql -c "show optimizer_array_expansion_threshold;" testdb; ------------------------------------- 1000 # ~~~~~----> see here (1 row) SET $ psql -d testdb -f ./04_compar_partition_scan.sql optimizer_array_expansion_threshold QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=16) -> Sequence (cost=0.00..431.00 rows=1 width=16) -> Partition Selector for pa_test (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4) Partitions selected: 1 (out of 5) ~~~~~-> see here -> Dynamic Seq Scan on pa_test (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=16) Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,20220508,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,202 20521,20220522,20220523,20220524,20220525,20220526,20220527,20220528,20220529,20220530,20220531,20220601,20220602,20220603,20220604,20220605,20220606,20220607,20220608,20220609,20220610,20220611,20220612,20220613,20220614,20220615,20220616,20220617,20220618,20220619,20220620,20220621, 20220622,20220623,20220624,20220625,20220626,20220627,20220628,20220629,20220630,20220701,20220702,20220703,20220704,20220705,20220706,20220707,20220708,20220709,20220710,20220711,20220712,20220713,20220714,20220715,20220716,20220717,20220718,20220719,20220720,20220721,20220722,202207 23,20220724,20220725,20220726,20220727,20220728,20220729,20220730,20220731}'::text[])) Optimizer: Pivotal Optimizer (GPORCA) (7 rows) SET QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=16) -> Sequence (cost=0.00..431.00 rows=1 width=16) -> Partition Selector for pa_test (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4) Partitions selected: 1 (out of 5) ~~~~~-> see here -> Dynamic Seq Scan on pa_test (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=16) Filter: (pdate = ANY ('{20220424,20220425,20220426,20220427,20220428,20220429,20220430,20220501,20220502,20220503,20220504,20220505,20220506,20220507,20220508,20220509,20220510,20220511,20220512,20220513,20220514,20220515,20220516,20220517,20220518,20220519,20220520,202 20521,20220522,20220523,20220524,20220525,20220526,20220527,20220528,20220529,20220530,20220531,20220601,20220602,20220603,20220604,20220605,20220606,20220607,20220608,20220609,20220610,20220611,20220612,20220613,20220614,20220615,20220616,20220617,20220618,20220619,20220620,20220621, 20220622,20220623,20220624,20220625,20220626,20220627,20220628,20220629,20220630,20220701,20220702,20220703,20220704,20220705,20220706,20220707,20220708,20220709,20220710,20220711,20220712,20220713,20220714,20220715,20220716,20220717,20220718,20220719,20220720,20220721,20220722,202207 23,20220724,20220725,20220726,20220727,20220728,20220729,20220730,20220731,20220801,20220802}'::text[])) Optimizer: Pivotal Optimizer (GPORCA) (7 rows)
Please refer the following technical explanation about optimizer_array_expansion_threshold described at the link[1] in order to understand why this problem occurs.
optimizer_array_expansion_threshold When GPORCA is enabled (the default) and is processing a query that contains a predicate with a constant array, the optimizer_array_expansion_threshold parameter limits the optimization process based on the number of constants in the array. If the array in the query predicate contains more than the number elements specified by parameter, GPORCA deactivates the transformation of the predicate into its disjunctive normal form during query optimization. The default value is 100. For example, when GPORCA is running a query that contains an IN clause with more than 100 elements, GPORCA does not transform the predicate into its disjunctive normal form during query optimization to reduce optimization time consume less memory. The difference in query processing can be seen in the filter condition for the IN clause of the query EXPLAIN plan. Changing the value of this parameter changes the trade-off between a shorter optimization time and lower memory consumption, and the potential benefits from constraint derivation during query optimization, for example conflict detection and partition elimination. The parameter can be set for a database system, an individual database, or a session or query.See optimizer_array_expansion_threshold for full details.