Greenplum 7.x: Cluster Will Not Start After Double-Fault Event (Simultaneous Loss of Primary and Mirror Segment)
search cancel

Greenplum 7.x: Cluster Will Not Start After Double-Fault Event (Simultaneous Loss of Primary and Mirror Segment)

book

Article ID: 415641

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum

Issue/Introduction

After an abrupt shutdown of the VM hosting the Greenplum instance (without a clean Greenplum shutdown), both the primary and its corresponding mirror segment may be down. Upon cluster restart, the error below is encountered, and the cluster fails to start.

FTS: double fault detected for content id XX

Symptoms

  • Both primary and mirror segments are down for a given content ID.
  • The coordinator (master/segment 0) remains running.
  • FTS detects the outage and attempts but fails to promote the mirror, since it is offline.
  • Error message “double fault detected for content id XX” is recorded in the gp_configuration_history table and logs.
  • Cluster restart (gpstart) does not bring the segments online; the acting primary remains in a mirror state and awaits WAL replication.
  • The acting mirror cannot be started during gprecoverseg, citing a missing replication slot.

Environment

Greenplum Database 7.x Clusters

Cause

  • In Greenplum 7.x, gpstart requires mirror segment to have WAL receiver up. When attempting to restart from a double fault, the affected segment (previous primary or mirror) could start in the wrong role due to the presence of the standby.signal file and expects a WAL receiver to synchronize. Because the original promotion never happened, the acting primary has no replication slot, which then prevents the mirror from starting. This blocks segment recovery.
  • This issue does not exist in Greenplum 6X. In 6X, gpstart does not require mirror segment to have WAL receiver UP. After cluster restart, FTS automatically promotes the acting primary and resumes normal operations.
  • This scenario is most likely to occur after non-graceful host VM or hardware shutdowns that do not permit proper deactivation of Greenplum processes.
  • A future version of Greenplum will address this recovery gap—check Greenplum product release notes for updates. 

Resolution

 

  • This is a known limitation in Greenplum 7.x clusters. A product fix is planned for a future release. 
  • Please subscribe to this article to receive updates related to the fix.
  • Before the fix becomes available, if you encounter this issue, please contact Broadcom support immediately.

Additional Information