This article will give an overview of how the GPDB dump works to backup file layouts and important flags to note.
How GPDB dump works
-gpcrondump: This writes out a database to SQL script files. The script files can be used to restore the database using the gpdbrestore utility. The gpcrondump utility can be called directly or from a crontab entry.
-gp_dump: This is called inside gpcrondump; gpcrondump is a wrapper to gp_dump that does basic validations before starting gp_dump.
Files
The master first starts to write to /home/gpadmin/gpAdminlogs/gpcrondump_YYYYMMDD.log and writes to this log file until we see "Starting Dump process". Then, it switches to write to the status file (gp_dump_status_0_2_<timestamp> on the master and all the segments. Once the database backup is completed, it creates a "create database" command file only on the master.
Under the $MASTER_DATA_DIRECTORY/db_dumps/yyyymmdd folder, we should see the following files (in addition to the gzipped dump files):
gp_dump_1_1_20160414091125.gz - Contains complete DDL for DB including schemas, functions, tables (not indexes and constraints) gp_dump_1_1_20160414091125_post_data.gz - DDL for dropping and recreating constraints and indexes gp_cdatabase_1_1_20160413141720 - Contains the CREATE DATABASE command gp_dump_status_1_1_20160413141720 - Status file that has high level logging on the gpcrondump job gp_dump_20160413141720.rpt - Has one line status if the backup was successful on each per segment gp_dump_20160413141720_ao_state_file - File offset of AO tables for incremental recovery gp_dump_20160413141720_co_state_file - File offset of CO tables for incremental recovery gp_dump_20160413141720_last_operation - Last DDL on each relation
On the segment data directory, there should be a db_dumps/yyyymmdd folder and the following files should be present (in addition to the gzipped dump files):
gp_dump_status_0_6_20160413141720 - Status file that has high level logging on the gpcrondump job
Important Flags
The following are important flags to note:
-B <parallel_processes> - The number of segments to check in parallel, default 60 -c (clear old dump files first) - Specify this option to delete old backups before performing a back up -g (copy config files) - Copy postgresql.conf, pg_ident.conf, and pg_hba.conf )both Master and Segments) -G (dump global objects) - Use pg_dumpall to dump global objects such as roles and tablespaces -h (record dump details) - Record details of database dump in DB table public.gpcrondump_history -j (vacuum before dump) - Run VACUUM before the dump starts -k (vacuum after dump) - Run VACUUM after the dump has completed successfully -o (clear old dump files only) - Clear out old dump files only, but do not run a dump --oids - Include object identifiers (oid) in dump data