Pivotal Query Optimizer/ORCA does not Use Bitmap Indexes in Plan when using IN Clause
search cancel

Pivotal Query Optimizer/ORCA does not Use Bitmap Indexes in Plan when using IN Clause

book

Article ID: 295777

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Symptoms:

When using the Pivotal Query Optimizer (GUC optimizer=on) on a table that has a BITMAP index and a query is executed with an IN clause, this produces unexpected query plans that do not use the BITMAP index.

  • Legacy optimizer (GUC optimizer=off) does not exhibit this behavior.
  • Affects 4.3.8.x and 4.3.9.0 (BITMAP indexes first supported by ORCA in 4.3.8.0).
  • When running the same query with a subselect in the IN clause, the optimizer will use the indexes.

Query Plan:

CREATE TABLE bitmap_test as SELECT * FROM generate_series(1,1000000) as a distributed randomly;
CREATE INDEX bitmap_index ON bitmap_test USING BITMAP(a);
CREATE TABLE bitmap_vals (a int);
INSERT INTO bitmap_vals VALUES (1);
INSERT INTO bitmap_vals VALUES (47);
Now that the table has been created, we can run an EXPLAIN ANALYZE with ORCA on and off to show the behavior.

optimizer=off IN clause with specific values will use BITMAP INDEX SCAN

gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (1,47);
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=200.62..392.21 rows=2 width=4)
   Rows out:  2 rows at destination with 1.185 ms to first row, 1.186 ms to end, start offset by 0.296 ms.
   ->  Bitmap Heap Scan on bitmap_test  (cost=200.62..392.21 rows=1 width=4)
         Recheck Cond: a = ANY ('{1,47}'::integer[])
         Rows out:  2 rows (seg0) with 0.412 ms to first row, 0.418 ms to end, start offset by 0.975 ms.
         ->  Bitmap Index Scan on bitmap_index  (cost=0.00..200.61 rows=1 width=0)
               Index Cond: a = ANY ('{1,47}'::integer[])
               Bitmaps out:  Avg 1.0 x 2 workers.  Max 1 (seg0) with 0.374 ms to end, start offset by 0.999 ms.
 Slice statistics:
   (slice0)    Executor memory: 326K bytes.
   (slice1)    Executor memory: 615K bytes avg x 2 workers, 813K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
 Total runtime: 1.600 ms
(16 rows)

optimizer=on IN clause with specific values will use TABLE SCAN

gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (1,47);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..456.84 rows=3 width=4)
   Rows out:  2 rows at destination with 63 ms to first row, 65 ms to end, start offset by 0.223 ms.
   ->  Table Scan on bitmap_test  (cost=0.00..456.84 rows=2 width=4)
         Filter: a = ANY ('{1,47}'::integer[])
         Rows out:  2 rows (seg0) with 0.023 ms to first row, 63 ms to end, start offset by 0.615 ms.
 Slice statistics:
   (slice0)    Executor memory: 318K bytes.
   (slice1)    Executor memory: 131K bytes avg x 2 workers, 131K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=on
 Optimizer status: PQO version 1.632
 Total runtime: 65.108 ms
(13 rows)
optimizer=off IN clause with subselect will use SEQ SCAN

gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (SELECT * FROM bitmap_vals);
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=1.05..13624.73 rows=20 width=4)
   Rows out:  2 rows at destination with 104 ms to first row, 131 ms to end, start offset by 7.045 ms.
   ->  Hash Join  (cost=1.05..13624.73 rows=10 width=4)
         Hash Cond: bitmap_test.a = bitmap_vals.a
         Rows out:  2 rows (seg0) with 6.288 ms to first row, 103 ms to end, start offset by 8.437 ms.
         Executor memory:  1K bytes avg, 1K bytes max (seg0).
         Work_mem used:  1K bytes avg, 1K bytes max (seg0). Workfile: (0 spilling, 0 reused)
         (seg0)   Hash chain length 1.0 avg, 1 max, using 2 of 262151 buckets.
         ->  Seq Scan on bitmap_test  (cost=0.00..11119.18 rows=500859 width=4)
               Rows out:  Avg 500000.0 rows x 2 workers.  Max 500000 rows (seg0) with 0.540 ms to first row, 36 ms to end, start offset by 14 ms.
         ->  Hash  (cost=1.04..1.04 rows=1 width=4)
               Rows in:  Avg 2.0 rows x 2 workers.  Max 2 rows (seg0) with 2.190 ms to end, start offset by 12 ms.
               ->  Broadcast Motion 2:2  (slice1; segments: 2)  (cost=1.01..1.04 rows=1 width=4)
                     Rows out:  Avg 2.0 rows x 2 workers at destination.  Max 2 rows (seg0) with 1.201 ms to first row, 2.183 ms to end, start offset by 12 ms.
                     ->  HashAggregate  (cost=1.01..1.02 rows=1 width=4)
                           Group By: bitmap_vals.a
                           Rows out:  2 rows (seg0) with 2.907 ms to first row, 3.685 ms to end, start offset by 9.191 ms.
                           Executor memory:  8245K bytes avg, 8281K bytes max (seg0).
                           ->  Seq Scan on bitmap_vals  (cost=0.00..1.01 rows=1 width=4)
                                 Rows out:  3 rows (seg0) with 0.084 ms to first row, 0.087 ms to end, start offset by 11 ms.
 Slice statistics:
   (slice0)    Executor memory: 322K bytes.
   (slice1)    Executor memory: 8485K bytes avg x 2 workers, 8523K bytes max (seg0).
   (slice2)    Executor memory: 4295K bytes avg x 2 workers, 4295K bytes max (seg0).  Work_mem: 1K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
 Total runtime: 138.194 ms
(29 rows)

optimizer=on IN clause with subselect will use BITMAP TABLE SCAN

gpadmin=# EXPLAIN ANALYZE SELECT * FROM bitmap_test WHERE a IN (SELECT * FROM bitmap_vals);
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice2; segments: 2)  (cost=0.00..431.07 rows=1 width=4)
   Rows out:  2 rows at destination with 2.605 ms to first row, 2.606 ms to end, start offset by 5.842 ms.
   ->  Nested Loop  (cost=0.00..431.07 rows=1 width=4)
         Join Filter: true
         Rows out:  2 rows (seg0) with 0.122 ms to first row, 0.398 ms to end, start offset by 7.957 ms.
         ->  Broadcast Motion 2:2  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=4)
               Rows out:  Avg 2.0 rows x 2 workers at destination.  Max 2 rows (seg0) with 0.008 ms to first row, 0.242 ms to end, start offset by 7.975 ms.
               ->  GroupAggregate  (cost=0.00..431.00 rows=1 width=4)
                     Group By: bitmap_vals.a
                     Rows out:  2 rows (seg0) with 0.277 ms to first row, 0.278 ms to end, start offset by 7.482 ms.
                     ->  Sort  (cost=0.00..431.00 rows=1 width=4)
                           Sort Key: bitmap_vals.a
                           Rows out:  3 rows (seg0) with 0.264 ms to end, start offset by 7.489 ms.
                           Executor memory:  58K bytes avg, 58K bytes max (seg0).
                           Work_mem used:  58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling, 0 reused)
                           ->  Table Scan on bitmap_vals  (cost=0.00..431.00 rows=1 width=4)
                                 Rows out:  3 rows (seg0) with 0.048 ms to first row, 0.050 ms to end, start offset by 7.691 ms.
         ->  Bitmap Table Scan on bitmap_test  (cost=0.00..0.07 rows=1 width=4)
               Recheck Cond: bitmap_test.a = bitmap_vals.a
               Rows out:  2 rows (seg0) with 0.083 ms to first row, 0.122 ms to end of 2 scans, start offset by 8.083 ms.
               ->  Bitmap Index Scan on bitmap_index  (cost=0.00..0.00 rows=0 width=0)
                     Index Cond: bitmap_test.a = bitmap_vals.a
                     Bitmaps out:  Avg 2.0 x 2 workers.  Max 2 (seg0) with 0.057 ms to first, 0.085 ms to end of 2 scans, start offset by 8.088 ms.
 Slice statistics:
   (slice0)    Executor memory: 330K bytes.
   (slice1)    Executor memory: 293K bytes avg x 2 workers, 293K bytes max (seg0).  Work_mem: 58K bytes max.
   (slice2)    Executor memory: 551K bytes avg x 2 workers, 641K bytes max (seg0).
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=on
 Optimizer status: PQO version 1.632
 Total runtime: 8.860 ms
(32 rows)

Environment


Cause

This issue is caused by a software defect in Greenplum logged under MPP-26427.  The Pivotal Query Optimizer/ORCA fails to consider the case that uses ArrayComp when trying to use a BITMAP index.

 

Resolution

This defect has been fixed in 4.3.9.1 and above.

  1. Turn off the Pivotal optimizer (SET optimizer=off;) as the legacy optimizer does not have this problem.
  2. If utilizing an IN clause, change to an OR clause or use a subselect, which will allow the optimizer to pick up the index. Note: Using OR clause may cause performance impacts on its own.