Select queries on GPDR read replica cluster fail during restore point application
search cancel

Select queries on GPDR read replica cluster fail during restore point application

book

Article ID: 422195

calendar_today

Updated On:

Products

VMware Tanzu Data Suite

Issue/Introduction

When running select queries are failing in Greenplum Disaster Recovery (GPDR)  when a new restore point is being applied on the read replica side. We apply restore points every 15 minutes and have long running queries. Shouldn't selects work without interruption?

Resolution

Any new user query, new transaction, after a query conflict on the recovery cluster coordinator segment will try to retrieve the currently cached restore point which has been invalidated and deleted. Only once GPDR has determined that the recovery cluster has reached the target restore point will the new snapshot be available. The consequences are queries failing due to an unlogged query conflict that cannot be mitigated by the max_standby_archive_delay parameter. 

Some things that you can try that may help resolve or lessen your issue:

1. Make the query conflict less likely to occur. Set the vacuum_defer_cleanup_age parameter on the primary cluster to a non-zero value. See documentation for determining an optimal value. Even setting it to something like 100 or 1000 could possibly help given their 15 minute restore point creation frequency (assuming creation frequency matches restore frequency).

2. Make GPDR detect that the recovery cluster has reached the target restore point faster in order to establish the new restore point snapshot faster. You may try using the hidden experimental restore command option '--cluster-wal-replay-poll-frequency' to increase the polling frequency from the default 30 seconds to something smaller (e.g. 1s or 500ms). Example usage: 'gpdr restore --type continous --restore-point lastest --cluster-wal-replay-poll-frequency 1s'.

Note that this may create a lot more log messages in the recovery cluster coordinator segment GPDB logs.