Skewed space usage on a single or few segments in cluster hosted on AWS as shown in GPCC caused due to dead rows(bloat) not removed by Vacuum job blocked by an orphan process/query

search cancel

Skewed space usage on a single or few segments in cluster hosted on AWS as shown in GPCC caused due to dead rows(bloat) not removed by Vacuum job blocked by an orphan process/query

book

Article ID: 381818

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum

Issue/Introduction

+ Customer noticed unexpected space usage increase (approx. 24% more) on one of the segments and its mirror compared to other segments in the cluster.

+ The difference was visible on the segments using the linux command to show top 10 files/directories by size in the segment data folder as shown in example below.

[gpadmin@ip-10-xxx-xxx-xxx gpseg3]$ du . | sort -nr | head -n10 
1819655348      . 1812549444      
./base 1807331052     
 ./base/16517 5150928 
./pg_log 2121684 
./base/16515 2077256 
./base/pgsql_tmp 1507348 
./pg_xlog 414736  
./global 377392  
./base/16514 318608  
./base/16384

[gpadmin@ip-10-xxx-xxx-xxx gpseg7]$ du . | sort -n -r | head -n 10 
1253123416      . 1246030092      
./base 1239046632     
 ./base/16517 5170336 
./pg_log 4167196 
./base/pgsql_tmp 1836220 
./base/16515 1507332 
./pg_xlog 405392  
./global 367768  
./base/16514 308780  
./base/16384

Environment

All Greenplum Database versions.

Hosted on public/private/on-premises.

Cause

+ Identify largest or odd one out (relfilenodes by number) or highest size.

+ find the relation name from relfilenode using command

select gp_segment_id,relname,relfilenode from gp_dist_random('pg_class') where relfilenode=9104131;

+ Identify relation size on all the segments using command

select gp_segment_id,pg_total_relation_size('<schema_name>.<identified relation name>') from gp_dist_random('gp_id');

+ Problematic segment should show up in the output as odd one out.

+ There might few or more relations depending on the overall size of relation.

+ Identify if there are any orphan process running on any of the segments using command

select * FROM gp_dist_random('pg_stat_activity') ;

+ Once identified find the PID of the orphan process(s) using GPSSH on each of the segments using command

gpssh -f <segment host file>

ps -ef | grep <PID>

Resolution

+ Terminate identified orphan process on the specific segment host.

+ Reclaim space by running Vacuum Full on the identified problematic tables.

Feedback

thumb_up Yes

thumb_down No