Scenario -
+ You are migrating from Greenplum (GP) AWS market place to regular Greenplum with AWS EC2 instances as the cluster hosts/servers.
+ In existing GP AWS market place environment, using the "gpsnap" tool to backup GP database.
+ gpsnap is a utility provided by GP AWS market place. gpsnap requires shutdown of GP cluster for a minute or so when it runs the gpsnap.
+ As gpsnap is not available on AWS marketplace anymore and planning to use AWS backup to do the GP database backup.
Questions -
The AWS Backup feature utilizes EBS volume snapshots to perform backup and restore operations. Functionally, this is similar to the gpsnap utility. You can configure an AWS Backup plan that includes all the EBS volumes associated with the Greenplum (GP) data disks across all GP instances, according to a defined backup schedule.
1. Does Greenplum require the cluster to be shut down during AWS backup of the Greenplum database?
Yes, it is strongly recommended to stop the Greenplum database before initiating an AWS backup or snapshot. This ensures data consistency.
2. What happens if the GP cluster is not shut down during backup? Will the restored cluster function normally?
If the database is not shut down or paused before the snapshot, the backup may not capture a fully consistent state. This could potentially result in issues during recovery. In such cases, upon restoration, the Greenplum database might require a longer recovery time due to the need to roll back uncommitted transactions or complete recovery procedures.
Restoration using AWS Backup is currently a manual process and can be automated with simple scripts. It involves detaching the original EBS volumes and attaching the restored volumes from the AWS Backup job. Once the restored volumes are attached, the Greenplum database can be started from the recovery point.
+ It's not necessary to wait for the entire AWS backup process to complete before restarting the Greenplum cluster. However, timing remains important to ensure data consistency and backup integrity.
+ According to AWS Documentation, Amazon EBS snapshots are point-in-time backups of your volumes. Snapshot creation is asynchronous: while the snapshot enters a pending state immediately upon initiation, the actual data transfer to Amazon S3 may take several hours to complete, depending on the volume size and change rate. Importantly, the snapshot captures the state of the volume at the moment it's requested, and you can safely resume writes to the volume once the snapshot has been triggered.
Recommendation -
+ Once all snapshots are in pending state, you can safely restart the Greenplum cluster. This avoids any data changes being written before the snapshot was initiated.
Estimated safe restart time: around 11:05 PM, assuming:
There’s no need to wait for the full 1-2 hour snapshot process to finish, just ensure the snapshot has started properly.