Skewed space usage on a single or few segments in cluster hosted on AWS as shown in GPCC caused due to dead rows(bloat) not removed by Vacuum job blocked by an orphan process/query
search cancel

Skewed space usage on a single or few segments in cluster hosted on AWS as shown in GPCC caused due to dead rows(bloat) not removed by Vacuum job blocked by an orphan process/query

book

Article ID: 381818

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum

Issue/Introduction

+ Customer noticed unexpected space usage increase (approx. 24% more) on one of the segments and its mirror compared to other segments in the cluster. 

+ The difference was visible on the segments using the linux command to show top 10 files/directories by size in the segment data folder as shown in example below.

[gpadmin@ip-10-xxx-xxx-xxx gpseg3]$ du . | sort -nr | head -n10

1819655348      .
1812549444      
./base
1807331052     
 ./base/16517
5150928 
./pg_log
2121684 
./base/16515
2077256 
./base/pgsql_tmp
1507348 
./pg_xlog
414736  
./global
377392  
./base/16514
318608  
./base/16384

[gpadmin@ip-10-xxx-xxx-xxx gpseg7]$ du . | sort -n -r | head -n 10

1253123416      .
1246030092      
./base
1239046632     
 ./base/16517
5170336 
./pg_log
4167196 
./base/pgsql_tmp
1836220 
./base/16515
1507332 
./pg_xlog
405392  
./global
367768  
./base/16514
308780  
./base/16384

Environment

All Greenplum Database versions.

Hosted on public/private/on-premises.

Cause

+ Identify largest or odd one out (relfilenodes by number) or highest size.

+ find the relation name from relfilenode using command 

select gp_segment_id,relname,relfilenode from gp_dist_random('pg_class') where relfilenode=9104131;

+ Identify relation size on all the segments using command 

select gp_segment_id,pg_total_relation_size('<schema_name>.<identified relation name>') from gp_dist_random('gp_id');

+ Problematic segment should show up in the output as odd one out.

+ There might few or more relations depending on the overall size of relation.

+ Identify if there are any orphan process running on any of the segments using command 

select * FROM gp_dist_random('pg_stat_activity') ;

+ Once identified find the PID of the orphan process(s) using GPSSH on each of the segments using command 

gpssh -f <segment host file>
ps -ef | grep <PID>

Resolution

+ Terminate identified orphan process on the specific segment host.

+ Reclaim space by running Vacuum Full on the identified problematic tables.