Sometimes, we just want to know if any happening processing skew, below query collected the total spill size of each host, which can give you a straightforward output:
gpssh -f hostfile -e "du -b /data/primary/gpseg*/base/pgsql_tmp/*" | grep -v "du -b" | sort | awk -F" " '{ arr[$1]=arr[$1]+$2; tot=tot+$2 } END { for ( i in arr ) print "Segment node", i, arr[i], "bytes (" arr[i]/(1024^3)" GB)"; print "Total", tot, "bytes (" tot/(1024^3)" GB)" }'
* Here assumes your segment data directory is /data/primary/gpseg*, and the hostfile name is "hostfile".
The example output is:
gpssh -f hostfile -e "du -b /data/primary/gpseg*/base/pgsql_tmp/*" | grep -v "du -b" | sort | awk -F" " '{ arr[$1]=arr[$1]+$2; tot=tot+$2 } END { for ( i in arr ) print "Segment node", i, arr[i], "bytes (" arr[i]/(1024^3)" GB)"; print "Total", tot, "bytes (" tot/(1024^3)" GB)" }'
Segment node [sdw1] xxxx bytes (x GB)
Segment node [sdw1] xxxx bytes (x GB)
Segment node [sdw1] xxxx bytes (x GB)
Segment node [sdw1] xxxx bytes (x GB)
Total xx bytes (0 GB)
If we see a certain segment node spill size is much larger that others, we can say there must be processing skew.