+ Customer noticed unexpected space usage increase (approx. 24% more) on one of the segments and its mirror compared to other segments in the cluster.
+ The difference was visible on the segments using the linux command to show top 10 files/directories by size in the segment data folder as shown in example below.
[gpadmin@ip-10-xxx-xxx-xxx gpseg3]$ du . | sort -nr | head -n10
1819655348 .
1812549444
./base
1807331052
./base/16517
5150928
./pg_log
2121684
./base/16515
2077256
./base/pgsql_tmp
1507348
./pg_xlog
414736
./global
377392
./base/16514
318608
./base/16384
[gpadmin@ip-10-xxx-xxx-xxx gpseg7]$ du . | sort -n -r | head -n 10
1253123416 .
1246030092
./base
1239046632
./base/16517
5170336
./pg_log
4167196
./base/pgsql_tmp
1836220
./base/16515
1507332
./pg_xlog
405392
./global
367768
./base/16514
308780
./base/16384
All Greenplum Database versions.
Hosted on public/private/on-premises.
+ Identify largest or odd one out (relfilenodes by number) or highest size.
+ find the relation name from relfilenode using command
select gp_segment_id,relname,relfilenode from gp_dist_random('pg_class') where relfilenode=9104131;
+ Identify relation size on all the segments using command
select gp_segment_id,pg_total_relation_size('<schema_name>.<identified relation name>') from gp_dist_random('gp_id');
+ Problematic segment should show up in the output as odd one out.
+ There might few or more relations depending on the overall size of relation.
+ Identify if there are any orphan process running on any of the segments using command
select * FROM gp_dist_random('pg_stat_activity') ;
+ Once identified find the PID of the orphan process(s) using GPSSH on each of the segments using command
gpssh -f <segment host file>
ps -ef | grep <PID>
+ Terminate identified orphan process on the specific segment host.
+ Reclaim space by running Vacuum Full on the identified problematic tables.