Standby Master Replication Failure Due to Missing WAL Segments in Greenplum Database
search cancel

Standby Master Replication Failure Due to Missing WAL Segments in Greenplum Database

book

Article ID: 394433

calendar_today

Updated On:

Products

VMware Tanzu Data Suite Greenplum VMware Tanzu Greenplum

Issue/Introduction

In Greenplum Database environments utilizing streaming replication, the standby master maintains synchronization with the primary master by continuously applying Write-Ahead Log (WAL) segments. A common issue arises when the standby master attempts to retrieve a WAL segment that the primary has already recycled or removed, leading to replication failure.

 

The standby master may be reported as "Down" on the Command Center.

 

Executing gpstate -f displays:

Standby status           = Standby host passive
...
No entries found in pg_stat_replication

 

Standby master logs contain errors such as:

ERROR: could not receive data from WAL stream: ERROR: requested WAL segment ############### has already been removed

 

Cause

This issue occurs when the standby master falls behind the primary master and the required WAL segments are no longer available on the primary due to removal. Without access to these segments, the standby cannot catch up and results in replication failure.

Resolution

To restore replication, the standby master can be reinitialized to synchronize with the current state of the primary master.

Steps:

  1. Remove the Existing Standby Master Configuration:
    • On the primary master host, execute: gpinitstandby -r
    • This command removes the current standby master configuration from the Greenplum system catalog.
  2. Add and Initialize the Standby Master:
    • On the primary master host, run: gpinitstandby -s <standby_hostname>
    • Replace <standby_hostname> with the actual hostname of your standby master server.
    • This command sets up the standby master and starts the synchronization process.
    • Note: This operation will shut down the Greenplum Database system temporarily. Ensure to perform this during a scheduled maintenance window to minimize disruption.
  3. Verify the Standby Master Status:
    • After reinitialization, confirm that the standby master is functioning correctly by running: gpstate -f