When user wants to run /opt/GPDSB/tpcds.sh as role ${ADMIN_USER} against a custom database, for example, if people wish to run tpcds as gpadmin user against a database named test, they can simply change the following line in gpdsb_variables.sh and then source it:
export PGDATABASE="test"
and executing the script ./tpcds.sh it'd fail with:
psql:/opt/GPDSB/05_sql/101.dsbench.01.sql:25: ERROR: relation "store_returns" does not exist
LINE 5: from store_returns
GPDSB 1.0
The query content indicated that it's querying the table store_returns without specifying any schema:
$ cat /opt/GPDSB/05_sql/101.dsbench.01.sql
set role dsbench;
:EXPLAIN_ANALYZE
-- start query 12 in stream 0 using template query1.tpl and seed 893351816
with customer_total_return as
(select sr_customer_sk as ctr_customer_sk
,sr_store_sk as ctr_store_sk
,sum(SR_RETURN_TAX) as ctr_total_return
from store_returns
,date_dim
where sr_returned_date_sk = d_date_sk
and d_year =2001
group by sr_customer_sk
,sr_store_sk)
select c_customer_id
from customer_total_return ctr1
,store
,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
from customer_total_return ctr2
where ctr1.ctr_store_sk = ctr2.ctr_store_sk)
and s_store_sk = ctr1.ctr_store_sk
and s_state = 'TN'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id
limit 100;
-- end query 12 in stream 0 using template query1.tpl
This table was created under tpcds schema during previous steps:
$ cat 03_ddl/018.gpdb.store_returns.sql
CREATE TABLE tpcds.store_returns (
sr_returned_date_sk integer,
sr_return_time_sk integer,
sr_item_sk integer NOT NULL,
sr_customer_sk integer,
sr_cdemo_sk integer,
sr_hdemo_sk integer,
sr_addr_sk integer,
sr_store_sk integer,
sr_reason_sk integer,
sr_ticket_number bigint NOT NULL,
sr_return_quantity integer,
sr_return_amt numeric(7,2),
sr_return_tax numeric(7,2),
sr_return_amt_inc_tax numeric(7,2),
sr_fee numeric(7,2),
sr_return_ship_cost numeric(7,2),
sr_refunded_cash numeric(7,2),
sr_reversed_charge numeric(7,2),
sr_store_credit numeric(7,2),
sr_net_loss numeric(7,2)
)
WITH (:MEDIUM_STORAGE)
:DISTRIBUTED_BY
partition by range(sr_returned_date_sk)
(start(2450815) INCLUSIVE end(2453005) INCLUSIVE every (100),
default partition others);
However, GPDSB tpcds.sh did not alter the search_path for custom database, it only altered for database ${ADMIN_USER}, so for a custom database the search_path remain as default public, which further result in "relation does not exist" error.
$ grep -n search_path /opt/GPDSB/03_ddl/rollout.sh.bak
80:SetSearchPath="ALTER database ${ADMIN_USER} SET search_path=tpcds, \"\${ADMIN_USER}\", public"
95:log_time "Set search_path for database ${ADMIN_USER}"
Resolution
Upgrade to GPDSB1.1 to fix this issue.
Temporary Workaround
Once tpcds schema is crearted, run ALTER DATABASE <database_name> SET search_path TO tpcds, public; to change the search path
2 more possible scenario if the error persists even after performing the workaround
Scenario #1
If there's any role config on ${ADMIN_USER} to set search_path, it will override the above query. For example if here is a ${ADMIN_USER} - gpadmin is with:
gpadmin=# select rolname, rolconfig from pg_roles where rolname='gpadmin';
rolname | rolconfig
---------+-----------
gpadmin | {"search_path=\"$user\", public"}
(1 row)
The tpc-ds benchmark run will keep failing, in this case, either:
- Reset the role search_path if it's not required
alter role gpadmin reset search_path ;
- Add tpcds into the search_path:
alter role gpadmin set search_path=tpcds, "$user", public ;
Scenario #2
- If there are any statement that was trying to set search_path when psql via .psqlrc, it will also result in the same error, for example:
# cat /home/gpadmin/.psqlrc
set search_path=public;
- In this case remove the line in .psqlrc if it is not required.