gpsnap utility for AWS Marketplace not available to take the AWS backups
search cancel

gpsnap utility for AWS Marketplace not available to take the AWS backups

book

Article ID: 399305

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

Scenario -

+ You are migrating from Greenplum (GP) AWS market place to regular Greenplum with AWS EC2 instances as the cluster hosts/servers.

+ In existing GP AWS market place environment, using the "gpsnap" tool to backup GP database.

+ gpsnap is a utility provided by GP AWS market place. gpsnap requires shutdown of GP cluster for a minute or so when it runs the gpsnap. 

+ As gpsnap is not available on AWS marketplace anymore and planning to use AWS backup to do the GP database backup.

 

Questions - 

  1. Does Greenplum require the cluster to be shut down during AWS backup of the Greenplum database?
  2. What happens if the GP cluster is not shut down during backup? Will the restored cluster function normally?

Resolution

The AWS Backup feature utilizes EBS volume snapshots to perform backup and restore operations. Functionally, this is similar to the gpsnap utility. You can configure an AWS Backup plan that includes all the EBS volumes associated with the Greenplum (GP) data disks across all GP instances, according to a defined backup schedule.

1. Does Greenplum require the cluster to be shut down during AWS backup of the Greenplum database?

Yes, it is strongly recommended to stop the Greenplum database before initiating an AWS backup or snapshot. This ensures data consistency.

 

2. What happens if the GP cluster is not shut down during backup? Will the restored cluster function normally?

If the database is not shut down or paused before the snapshot, the backup may not capture a fully consistent state. This could potentially result in issues during recovery. In such cases, upon restoration, the Greenplum database might require a longer recovery time due to the need to roll back uncommitted transactions or complete recovery procedures.

Restoration using AWS Backup is currently a manual process and can be automated with simple scripts. It involves detaching the original EBS volumes and attaching the restored volumes from the AWS Backup job. Once the restored volumes are attached, the Greenplum database can be started from the recovery point.

Additional Information

+ It's not necessary to wait for the entire AWS backup process to complete before restarting the Greenplum cluster. However, timing remains important to ensure data consistency and backup integrity.

+ According to AWS Documentation, Amazon EBS snapshots are point-in-time backups of your volumes. Snapshot creation is asynchronous: while the snapshot enters a pending state immediately upon initiation, the actual data transfer to Amazon S3 may take several hours to complete, depending on the volume size and change rate. Importantly, the snapshot captures the state of the volume at the moment it's requested, and you can safely resume writes to the volume once the snapshot has been triggered.

 

Recommendation - 

  • Wait for all EBS volume snapshots to reach the pending or in-progress state - this confirms the snapshot has been triggered.
  • This typically occurs within 2-3 minutes of the scheduled backup time (for example, by 11:03 PM if backup starts at 11:00 PM).
  • You can monitor this using the AWS CLI ("aws ec2 describe-snapshots") or CloudWatch Events. 

 

+ Once all snapshots are in pending state, you can safely restart the Greenplum cluster. This avoids any data changes being written before the snapshot was initiated.

Estimated safe restart time: around 11:05 PM, assuming:

  • The Greenplum cluster shuts down cleanly by 10:58 PM.
  • AWS Backup starts as scheduled at 11:00 PM.
  • Snapshot initiation occurs within a few minutes. 

There’s no need to wait for the full 1-2 hour snapshot process to finish, just ensure the snapshot has started properly.