When using the Pivotal Query Optimizer (GUC optimizer=on) on a table that has a BITMAP index and a query is executed with an IN clause, this produces unexpected query plans that do not use the BITMAP index.
Query Plan:
CREATE TABLE bitmap_test as SELECT * FROM generate_series(1,1000000) as a distributed randomly; CREATE INDEX bitmap_index ON bitmap_test USING BITMAP(a); CREATE TABLE bitmap_vals (a int); INSERT INTO bitmap_vals VALUES (1); INSERT INTO bitmap_vals VALUES (47); Now that the table has been created, we can run an EXPLAIN ANALYZE with ORCA on and off to show the behavior. optimizer=off IN clause with specific values will use BITMAP INDEX SCAN gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (1,47); QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=200.62..392.21 rows=2 width=4) Rows out: 2 rows at destination with 1.185 ms to first row, 1.186 ms to end, start offset by 0.296 ms. -> Bitmap Heap Scan on bitmap_test (cost=200.62..392.21 rows=1 width=4) Recheck Cond: a = ANY ('{1,47}'::integer[]) Rows out: 2 rows (seg0) with 0.412 ms to first row, 0.418 ms to end, start offset by 0.975 ms. -> Bitmap Index Scan on bitmap_index (cost=0.00..200.61 rows=1 width=0) Index Cond: a = ANY ('{1,47}'::integer[]) Bitmaps out: Avg 1.0 x 2 workers. Max 1 (seg0) with 0.374 ms to end, start offset by 0.999 ms. Slice statistics: (slice0) Executor memory: 326K bytes. (slice1) Executor memory: 615K bytes avg x 2 workers, 813K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 1.600 ms (16 rows)
optimizer=on IN clause with specific values will use TABLE SCAN
gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (1,47); QUERY PLAN ------------------------------------------------------------------------------------------------------ Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..456.84 rows=3 width=4) Rows out: 2 rows at destination with 63 ms to first row, 65 ms to end, start offset by 0.223 ms. -> Table Scan on bitmap_test (cost=0.00..456.84 rows=2 width=4) Filter: a = ANY ('{1,47}'::integer[]) Rows out: 2 rows (seg0) with 0.023 ms to first row, 63 ms to end, start offset by 0.615 ms. Slice statistics: (slice0) Executor memory: 318K bytes. (slice1) Executor memory: 131K bytes avg x 2 workers, 131K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=on Optimizer status: PQO version 1.632 Total runtime: 65.108 ms (13 rows) optimizer=off IN clause with subselect will use SEQ SCAN gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (SELECT * FROM bitmap_vals); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 2:1 (slice2; segments: 2) (cost=1.05..13624.73 rows=20 width=4) Rows out: 2 rows at destination with 104 ms to first row, 131 ms to end, start offset by 7.045 ms. -> Hash Join (cost=1.05..13624.73 rows=10 width=4) Hash Cond: bitmap_test.a = bitmap_vals.a Rows out: 2 rows (seg0) with 6.288 ms to first row, 103 ms to end, start offset by 8.437 ms. Executor memory: 1K bytes avg, 1K bytes max (seg0). Work_mem used: 1K bytes avg, 1K bytes max (seg0). Workfile: (0 spilling, 0 reused) (seg0) Hash chain length 1.0 avg, 1 max, using 2 of 262151 buckets. -> Seq Scan on bitmap_test (cost=0.00..11119.18 rows=500859 width=4) Rows out: Avg 500000.0 rows x 2 workers. Max 500000 rows (seg0) with 0.540 ms to first row, 36 ms to end, start offset by 14 ms. -> Hash (cost=1.04..1.04 rows=1 width=4) Rows in: Avg 2.0 rows x 2 workers. Max 2 rows (seg0) with 2.190 ms to end, start offset by 12 ms. -> Broadcast Motion 2:2 (slice1; segments: 2) (cost=1.01..1.04 rows=1 width=4) Rows out: Avg 2.0 rows x 2 workers at destination. Max 2 rows (seg0) with 1.201 ms to first row, 2.183 ms to end, start offset by 12 ms. -> HashAggregate (cost=1.01..1.02 rows=1 width=4) Group By: bitmap_vals.a Rows out: 2 rows (seg0) with 2.907 ms to first row, 3.685 ms to end, start offset by 9.191 ms. Executor memory: 8245K bytes avg, 8281K bytes max (seg0). -> Seq Scan on bitmap_vals (cost=0.00..1.01 rows=1 width=4) Rows out: 3 rows (seg0) with 0.084 ms to first row, 0.087 ms to end, start offset by 11 ms. Slice statistics: (slice0) Executor memory: 322K bytes. (slice1) Executor memory: 8485K bytes avg x 2 workers, 8523K bytes max (seg0). (slice2) Executor memory: 4295K bytes avg x 2 workers, 4295K bytes max (seg0). Work_mem: 1K bytes max. Statement statistics: Memory used: 128000K bytes Settings: optimizer=off Optimizer status: legacy query optimizer Total runtime: 138.194 ms (29 rows)
optimizer=on IN clause with subselect will use BITMAP TABLE SCAN
gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (SELECT * FROM bitmap_vals); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 2:1 (slice2; segments: 2) (cost=0.00..431.07 rows=1 width=4) Rows out: 2 rows at destination with 2.605 ms to first row, 2.606 ms to end, start offset by 5.842 ms. -> Nested Loop (cost=0.00..431.07 rows=1 width=4) Join Filter: true Rows out: 2 rows (seg0) with 0.122 ms to first row, 0.398 ms to end, start offset by 7.957 ms. -> Broadcast Motion 2:2 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=4) Rows out: Avg 2.0 rows x 2 workers at destination. Max 2 rows (seg0) with 0.008 ms to first row, 0.242 ms to end, start offset by 7.975 ms. -> GroupAggregate (cost=0.00..431.00 rows=1 width=4) Group By: bitmap_vals.a Rows out: 2 rows (seg0) with 0.277 ms to first row, 0.278 ms to end, start offset by 7.482 ms. -> Sort (cost=0.00..431.00 rows=1 width=4) Sort Key: bitmap_vals.a Rows out: 3 rows (seg0) with 0.264 ms to end, start offset by 7.489 ms. Executor memory: 58K bytes avg, 58K bytes max (seg0). Work_mem used: 58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused) -> Table Scan on bitmap_vals (cost=0.00..431.00 rows=1 width=4) Rows out: 3 rows (seg0) with 0.048 ms to first row, 0.050 ms to end, start offset by 7.691 ms. -> Bitmap Table Scan on bitmap_test (cost=0.00..0.07 rows=1 width=4) Recheck Cond: bitmap_test.a = bitmap_vals.a Rows out: 2 rows (seg0) with 0.083 ms to first row, 0.122 ms to end of 2 scans, start offset by 8.083 ms. -> Bitmap Index Scan on bitmap_index (cost=0.00..0.00 rows=0 width=0) Index Cond: bitmap_test.a = bitmap_vals.a Bitmaps out: Avg 2.0 x 2 workers. Max 2 (seg0) with 0.057 ms to first, 0.085 ms to end of 2 scans, start offset by 8.088 ms. Slice statistics: (slice0) Executor memory: 330K bytes. (slice1) Executor memory: 293K bytes avg x 2 workers, 293K bytes max (seg0). Work_mem: 58K bytes max. (slice2) Executor memory: 551K bytes avg x 2 workers, 641K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=on Optimizer status: PQO version 1.632 Total runtime: 8.860 ms (32 rows)
This issue is caused by a software defect in Greenplum logged under MPP-26427. The Pivotal Query Optimizer/ORCA fails to consider the case that uses ArrayComp when trying to use a BITMAP index.
This defect has been fixed in 4.3.9.1 and above.