When performing a hash join against very large datasets, the following error is produced:
- Extreme skew in the innerside of Hashjoin - could not write to temporary file: No space left on device
Error Messages from pg_log
:
"Extreme skew in the innerside of Hashjoin, nbatch 32, mintuples 6672, maxtuples 1258717",,,,,," create table summary_topic_tmp with (appendonly=true, orientation=parquet, compresstype=snappy) as select t2.reader_id, t3.topic_id, sum(t2.pv) as pv from ( select st.reader_id, st.vertical_id, sum(st.pv) as pv from ( select date, reader_id, vertical_id, pv from daily_201609 union select date, reader_id, vertical_id, pv from daily_201608 union select date, reader_id, vertical_id, pv from daily_201607 union select date, reader_id, vertical_id, pv from daily_201606 ) st where st.date between '2016-06-26' and '2016-09-24' group by st.reader_id, st.vertical_id ) t2 inner join table02 t3 on t2.vertical_id = t3.vertical_id group by t2.reader_id, t3.topic_id; : : "ERROR","58030","could not write to temporary file: No space left on device",,,,,,"
The errors indicate that the statistics for some/all of the tables used in the query are not up-to-date. This caused extreme data skew and subsequent large spills on a one or more segment nodes which caused the No space left on device error.
Analyze the EXPLAIN PLAN generated by the Greenplum database for the query. If some slices show very low number of rows or rows=1, It indicates that the table statistics may not be up-to-date.
: -> Table Scan on daily_201609 (cost=0.00..431.00 rows=1 width=28) Filter: date >= '2016-06-26'::date AND date <= '2016-09-24'::date :
Run ANALYZE against all the tables referenced in the query.
It would also be advisable to run VACUUM on each of the tables before ANALYZE.