gpcrondump Important Flags and File Layouts

Products

VMware Tanzu Greenplum

Issue/Introduction

This article will give an overview of how the GPDB dump works to backup file layouts and important flags to note.

Environment

Resolution

How GPDB dump works

-gpcrondump: This writes out a database to SQL script files. The script files can be used to restore the database using the gpdbrestore utility. The gpcrondump utility can be called directly or from a crontab entry.

-gp_dump: This is called inside gpcrondump; gpcrondump is a wrapper to gp_dump that does basic validations before starting gp_dump.

Files

The master first starts to write to /home/gpadmin/gpAdminlogs/gpcrondump_YYYYMMDD.log and writes to this log file until we see "Starting Dump process". Then, it switches to write to the status file (gp_dump_status_0_2_<timestamp> on the master and all the segments. Once the database backup is completed, it creates a "create database" command file only on the master.

Under the $MASTER_DATA_DIRECTORY/db_dumps/yyyymmdd folder, we should see the following files (in addition to the gzipped dump files):

gp_dump_1_1_20160414091125.gz - Contains complete DDL for DB including schemas, functions, tables (not indexes and constraints)

gp_dump_1_1_20160414091125_post_data.gz - DDL for dropping and recreating constraints and indexes

gp_cdatabase_1_1_20160413141720 - Contains the CREATE DATABASE command

gp_dump_status_1_1_20160413141720 - Status file that has high level logging on the gpcrondump job

gp_dump_20160413141720.rpt - Has one line status if the backup was successful on each per segment

gp_dump_20160413141720_ao_state_file - File offset of AO tables for incremental recovery

gp_dump_20160413141720_co_state_file - File offset of CO tables for incremental recovery

gp_dump_20160413141720_last_operation - Last DDL on each relation

On the segment data directory, there should be a db_dumps/yyyymmdd folder and the following files should be present (in addition to the gzipped dump files):

gp_dump_status_0_6_20160413141720 - Status file that has high level logging on the gpcrondump job

Important Flags

The following are important flags to note:

-B <parallel_processes> - The number of segments to check in parallel, default 60
-c (clear old dump files first) - Specify this option to delete old backups before performing a back up
-g (copy config files) - Copy postgresql.conf, pg_ident.conf, and pg_hba.conf )both Master and Segments)
-G (dump global objects) - Use pg_dumpall to dump global objects such as roles and tablespaces
-h (record dump details) - Record details of database dump in DB table public.gpcrondump_history
-j (vacuum before dump) - Run VACUUM before the dump starts
-k (vacuum after dump) - Run VACUUM after the dump has completed successfully
-o (clear old dump files only) - Clear out old dump files only, but do not run a dump
--oids - Include object identifiers (oid) in dump data

Additional Information

+ Environment:

Pivotal Greenplum 4.3.x
Operating System- Red Hat Enterprise Linux 6.x