Customers have reported ORCA is incorrectly redistributing when joining replicated tables ON GPDB version 6.28.1.
The following example compares the GPORCA and Postgres Query Optimizers, highlighting the performance differences when joining replicated tables. Due to GPORCA's incorrect redistribution of data during these joins, the query completion time is significantly longer compared to the Postgres Query Optimizer.
*** GPORCA ***
_prd=> create temp table rep1 as select generate_series(1,10000000) a distributed replicated;
SELECT 10000000
_prd=> explain select * from foo t1 join rep1 t2 using (a);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Gather Motion 960:1 (slice2; segments: 960) (cost=0.00..1430.81 rows=10000000 width=8)
-> Hash Join (cost=0.00..1254.43 rows=10417 width=8)
Hash Cond: (rep1.a = foo.a)
-> Result (cost=0.00..785.63 rows=10417 width=4)
-> Seq Scan on rep1 (cost=0.00..618.00 rows=10000000 width=4)
-> Hash (cost=437.34..437.34 rows=104164 width=8)
-> Redistribute Motion 960:960 (slice1; segments: 960) (cost=0.00..437.34 rows=104164 width=8)
Hash Key: foo.a
-> Seq Scan on foo (cost=0.00..433.18 rows=104164 width=8)
Optimizer: Pivotal Optimizer (GPORCA)
(10 rows)Time: 38.236 ms
==========================================================================================================
*** Postgres Query Optimizer ***
_prd=> set optimizer=off;
SET
_prd=> explain select * from foo t1 join rep1 t2 using (a);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Gather Motion 960:1 (slice1; segments: 960) (cost=236002.00..2946884.20 rows=10000000 width=8)
-> Hash Join (cost=236002.00..2946884.20 rows=10417 width=8)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on foo t1 (cost=0.00..1110928.88 rows=104164 width=8)
-> Hash (cost=111002.00..111002.00 rows=10417 width=4)
-> Seq Scan on rep1 t2 (cost=0.00..111002.00 rows=10000000 width=4)
Optimizer: Postgres query optimizerGPDB 6.24.3
GPDB 6.28.1
Orca does not efficiently redistributing data when joining replicated tables.
To improve GPORCA's performance, the workaround is to set the optimizer_enable_replicated_table GUC to "off," which disables ORCA when running queries involving replicated tables.