Greenplum 7.x: Cluster Will Not Start After Double-Fault Event (Simultaneous Loss of Primary and Mirror Segment)
book
Article ID: 415641
calendar_today
Updated On:
Products
VMware Tanzu Data SuiteVMware Tanzu Greenplum
Issue/Introduction
After an abrupt shutdown of the VM hosting the Greenplum instance (without a clean Greenplum shutdown), both the primary and its corresponding mirror segment may be down. Upon cluster restart, the error below is encountered, and the cluster fails to start.
FTS: double fault detected for content id XX
Symptoms
Both primary and mirror segments are down for a given content ID.
The coordinator (master/segment 0) remains running.
FTS detects the outage and attempts but fails to promote the mirror, since it is offline.
Error message “double fault detected for content id XX” is recorded in the gp_configuration_history table and logs.
Cluster restart (gpstart) does not bring the segments online; the acting primary remains in a mirror state and awaits WAL replication.
The acting mirror cannot be started during gprecoverseg, citing a missing replication slot.
Environment
Greenplum Database 7.x Clusters
Cause
In Greenplum 7.x, gpstart requires mirror segment to have WAL receiver up. When attempting to restart from a double fault, the affected segment (previous primary or mirror) could start in the wrong role due to the presence of the standby.signal file and expects a WAL receiver to synchronize. Because the original promotion never happened, the acting primary has no replication slot, which then prevents the mirror from starting. This blocks segment recovery.
This issue does not exist in Greenplum 6X. In 6X, gpstart does not require mirror segment to have WAL receiver UP. After cluster restart, FTS automatically promotes the acting primary and resumes normal operations.
This scenario is most likely to occur after non-graceful host VM or hardware shutdowns that do not permit proper deactivation of Greenplum processes.
A future version of Greenplum will address this recovery gap—check Greenplum product release notes for updates.
Resolution
This is a known limitation in Greenplum 7.x clusters. A product fix is planned for a future release.
Please subscribe to this article to receive updates related to the fix.
Before the fix becomes available, if you encounter this issue, please contact Broadcom support immediately.