Indexes are created on the table, but Greenplum seems to skip them and does not scan the data via the Index.
In this article, we will demonstrate how to force the index scan on the table.
The optimizer of Postgres or Greenplum is cost-based and selects the execution plan with the lowest cost. In Greenplum, the cost for one scattered IO access is measured by the random_page_cost parameter, which is 100 by default. As compared to 1 in Postgres, it is quite large; it is set in this way because we want to choose seq scan as frequently as possible in most cases.
Note- Applications built on Greenplum are most likely to be data warehouse applications. Keep in meep in mind that the cost-based optimizer is being used, so random_page_cost should not be changed to a smaller value in most cases.
gpadmin=# create table t as select generate_series x from generate_series(1,10000); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'x' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. SELECT 10000
gpadmin=# create index f_idx on t( lower(x)); CREATE INDEX -- Explain of the query shows a sequential scan on the table gpadmin=# explain select * from t where lower(x) = '1'; QUERY PLAN ------------------------------------------------------------------------------ Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..191.00 rows=2 width=4) -> Seq Scan on t (cost=0.00..191.00 rows=2 width=4) Filter: lower(x::text) = '1'::text (3 rows)
gpadmin=# set random_page_cost to 1; SET gpadmin=# explain select * from t where lower(x) = '1'; QUERY PLAN ---------------------------------------------------------------------------- Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..9.43 rows=2 width=4) -> Index Scan using f_idx on t (cost=0.00..9.43 rows=2 width=4) Index Cond: lower(x::text) = '1'::text Settings: random_page_cost=1 (4 rows)
This procedure should achieve the required results.
Pivotal Greenplum Database (GPDB) all versions