Greenplum Disaater Recovery(GPDR): "gpdr backup" and "gpdr restore" is slow
search cancel

Greenplum Disaater Recovery(GPDR): "gpdr backup" and "gpdr restore" is slow

book

Article ID: 431113

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

Taking snapshots of the primary database/cluster with "gpdr backup" and/or restoring the snapshot with "gpdr restore" on the DR database/cluster can run slowly and take a long time to complete.

 

Environment

Greenplum Database

Greenplum Disaster Recovery

Cause

There may be a number of causes for the slow backups and restores.

  • The connection to the pg_basebackup repository may be slow. The repository can be a number of different locations, See Configure the Primary Cluster for details
  • If using cgroups v2 (gp_resource_manager=group-v2") the CPU limit set in the "system_group" may be too low.

Resolution

Repository connection

Test the read and write speeds to the repository from the segment hosts on the primary cluster is the backup is slow and from the segment hosts of the DR cluster if the restores are slow.

If the repository type is "posix" then use "dd" command to read/write to the repository. If it is type S3, then you will need an appropriate client to read/write test files.

You may nee to engage your network administrators to check the network connection to the repository.

There is nothing that can be done within Greenplum if the connection to the repository is too slow.

Using cgroups v2 (gp_resource_manager=group-v2)

  • Increasing the CPU_MAX_PERCENT for the system_group may help the backups and restores to run faster. the default is 10, increasing it to a higher value, for example 30, will allow the cluster use more CPU when doing backups and restores.
  • This may affect running queries when the CPU of the hosts is very high. To avoid this reduce the CPU_WEIGHT for the system_group to a low value relative to the CPU_WEIGHT fo the other resource groups, See Assigning CPU Resource by Percentage for details on how this functions.
  • Create a restore point/backup of the primary after changing the settings.
  • Restore the restore point to the DR cluster. This will change the values in the resource group on the DR cluster to the same values as in the primary cluster
  • If read-replica is enabled in the DR cluster, then restart the cluster for the resource group changes to take effect.