1. Steps taken during a Greenplum full backup
When a full database backup is taken, the utility takes the following steps behind the scenes:
- The syntax of the backup command is checked
- The existence of the location for the backup is checked, if not available its created
- The location where the backup is about to be taken, the available size is checked and compared with the database backup size
- Once the above checks are done the backup is started.
- An Exclusive lock is acquired on
pg_class
, so that there is no modification to the DDL and its dumped.
- Once the DDL backup completes, it acquires a lock on all the user tables in Access Share Mode.
- The lock on
pg_class
is released.
- After that, backup agent is sent to all the segments that are part of the Greenplum cluster
- The backup agents then start to dump each segment data in parallel
- Once it completes, the agents exits and the locks on the tables are released.
- The backup is now completed.
2. Steps taken during a Greenplum incremental backup
When an incremental backup is taken, the utility takes the following steps behind the scenes:
- Every time a backup is run on GPDB, the backup utility creates state files for Append Optimized (AO) tables.
- These state files are created in order to store the tuple/row counts for each AO table/partition in the database.
- These state files also store the metadata operations (truncate, alter, etc.) performed on any of the tables in the database.
- No state files are created for heap tables since they get backed up with every incremental backup.
- When the user performs an incremental backup, the backup utility compares the state of each table in the database against the last backup using the state files.
- If there is any change in the state of the table since the last backup, it is marked as dirty and is backed up during an incremental backup.
- This procedure is followed only for AO tables.
3. Different backup and restore utilities available
The different tools to perform a backup or restore are:
4. How to generate DDL for a table
Use the pg_dump utility to generate DDL.
Example:
pg_dump -d db01 -t schema01.table01 -s -f myBackup.sql
Where:
"-d db01", specifies database "db01"
"-t schema01.table01" specifies to backup table "table01" in schema "schema01"
"-s" specifies to dump only the DDL of the table, no data is dumped
"-f myBackup.sql" specifies to write the backup to file "myBackup.sql"
5. How to back up just the schema for all databases in the cluster
The pg_dumpall
utility extracts all databases in a GPDB system to a single script file. It will generate the DDL for all objects, including global objects such as resource queues and roles.
Example:
$ pg_dumpall --gp-syntax --schema-only -f db_dump.sql
6. How to back up just the schema for a single database in the cluster
The pg_dump utility dumps
Example:
$ pg_dump -d db01 -s -f db01_schema_only.sql
7. How to back up just the data of a table
The following command will create a data dump of the "db01" database
$ pg_dump -a -d db01 -t schema01.table01 -f table01_data_only.txt
8. How to clone a database in the same cluster
The following command will create a full copy of database "db01" into a new database "db01_test".
# CREATE DATABASE db01_test TEMPLATE db01;
9. How to copy data to a different Greenplum cluster