Why there is too many gpload staging tables and how to clean up them?
search cancel

Why there is too many gpload staging tables and how to clean up them?

book

Article ID: 296783

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Sometimes, we may see many tables with names like "ext_gpload_reusable_<a_long_value>:
gpadmin=# select * from pg_tables where tablename like 'ext_gpload_reusable%';
 schemaname |                        tablename                         | tableowner | tablespace | hasindexes | hasrules | hastriggers 
------------+----------------------------------------------------------+------------+------------+------------+----------+-------------
 public     | ext_gpload_reusable_38b1f168_ed39_11ed_b27e_0050569eb631 | gpadmin    |            | f          | f        | f
 public     | ext_gpload_reusable_3a954ad8_ed3a_11ed_acdc_0050569eb631 | gpadmin    |            | f          | f        | f
 public     | ext_gpload_reusable_50910584_ed3a_11ed_ac81_0050569eb631 | gpadmin    |            | f          | f        | f 
Actually these tables are staging tables produced by gpload jobs. Each gpload job will generate one such table with a HASH Value as the suffix in its table name unless we define a custom name using the Field "STAGING_TABLE" in the yaml conf file.
These tables will not be dropped automatically when gpload jobs finish if "REUSE_TABLE" is set TRUE.

Environment

Product Version: 6.19

Resolution

There is one important point that if we just change a bit in the config file, for example, update ERROR_LIMIT to 20 from 25 without changing anything else, a rerun of the gpload will have to create a new staging table without reusing the old one:
2023-05-08 09:07:14|INFO|did not find an external table to reuse. creating ext_gpload_reusable_ab7bc554_ed3c_11ed_a74b_0050569eb631 
That should be able to explain the main reason why sometimes we see many staging tables remained in the DB.

Normally these tables will not have significant impact. If we find them annoying, an easy way to drop them is just to use the below script to generate a DROP list for these staging tables:
gpadmin=# select 'DROP EXTERNAL TABLE '||tablename||';' from pg_tables where tablename like 'ext_gpload_reusable%';
                                   ?column?                                    
-------------------------------------------------------------------------------
 DROP EXTERNAL TABLE ext_gpload_reusable_38b1f168_ed39_11ed_b27e_0050569eb631;
 DROP EXTERNAL TABLE ext_gpload_reusable_3a954ad8_ed3a_11ed_acdc_0050569eb631;
 DROP EXTERNAL TABLE ext_gpload_reusable_50910584_ed3a_11ed_ac81_0050569eb631;
 DROP EXTERNAL TABLE ext_gpload_reusable_ab7bc554_ed3c_11ed_a74b_0050569eb631; 
IMPORTANT: Please double check the list just in case any user tables in it before dropping them.

Please note there might be a new feature on this in gpload future releases.