[gpadmin@hdw1 ~]$ psql psql (8.2.15) Type "help" for help. gpadmin=# show optimizer; optimizer ----------- on (1 row) gpadmin=# create table test_columns (id int, testdata char(39)) with (appendonly=true, orientation=parquet) distributed randomly; CREATE TABLE gpadmin=# create external table ext_test_columns (like test_columns) LOCATION ('pxf://localhost:8020/tmp/hawq_testdata/test_data.csv?Profile=HdfsTextSimple') FORMAT 'CSV'; CREATE EXTERNAL TABLE gpadmin=# explain analyze insert into test_columns SELECT * from ext_test_columns; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Insert (cost=0.00..21281.23 rows=166667 width=16) Rows out: Avg 2500000.0 rows x 2 workers. Max 3222116 rows (seg0) with 166 ms to first row, 13583 ms to end, start offset by 249 ms. Executor memory: 1K bytes avg, 1K bytes max (seg0). -> Result (cost=0.00..447.89 rows=166667 width=32) Rows out: Avg 2500000.0 rows x 2 workers. Max 3222116 rows (seg0) with 159 ms to first row, 8947 ms to end, start offset by 249 ms. -> External Scan on ext_test_columns (cost=0.00..437.60 rows=166667 width=16) Rows out: Avg 2500000.0 rows x 2 workers. Max 3222116 rows (seg0) with 159 ms to first row, 7440 ms to end, start offset by 249 ms. Slice statistics: (slice0) Executor memory: 3251K bytes avg x 6 workers, 9449K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 14419.207 ms (12 rows) gpadmin=# select gp_segment_id, count(*) from test_columns group by 1; gp_segment_id | count ---------------+--------- 0 | 3222116 2 | 1777884 (2 rows) gpadmin=# alter table test_columns set WITH (reorganize=true) distributed randomly; ALTER TABLE gpadmin=# select gp_segment_id, count(*) from test_columns group by 1; gp_segment_id | count ---------------+--------- 0 | 3222116 2 | 1777884 (2 rows)The query optimizer does not perform a redistribute motion when adding or redistributing data in randomly distributed HAWQ tables.
If the data has already been inserted into the table, disable the optimizer and alter the table as shown below:
gpadmin=# set optimizer=off; SET gpadmin=# alter table test_columns set WITH (reorganize=true) distributed randomly; ALTER TABLE gpadmin=# select gp_segment_id, count(*) from test_columns group by 1; gp_segment_id | count ---------------+--------
1 | 833354 4 | 833354 2 | 833358 0 | 833288 3 | 833277 5 | 833369 (6 rows)