FTS (fault tolerance server) in Pivotal Greenplum (GPDB)

Products

VMware Tanzu Greenplum

Issue/Introduction

FTS is a very important component in Pivotal Greenplum (GPDB).

The following topics are covered in this article:

What is FTS?
How the GPDB Segment Fault Prober Works?
How to bring up failed mirror segments?
Explanation for gp_segment_configuration
Why is the mirror being marked down?
FTS-related parameters

Environment

Product Version: All versions of Greenplum

Resolution

What is FTS?

On the GPDB master host, there is a fault prober process that is forked and monitored from the postmaster postgres process. This fault prober process is also called the FTS (fault tolerance server) process. This process is restarted by the postmaster if it fails.

How does the GPDB Segment Fault Prober work?

The FTS is in a continuous loop, with sleep between each loop based on the settings below:

In each loop, the FTS will "probe" each primary segment database by making a TCP socket connection to the segment database using the hostname and port registered in gp_segment_configuration table. If the connection cannot be made or if a reply is not received from a connection in the timeout period, then a retry is attempted to that segment database. The number of retries is controlled by a setting.
When a segment is probed by FTS, it will report on a few simple checks, such as stat system call on the critical segment directories and checking for internal faults. If there are no issues, a positive reply is sent to the FTS server and no action is taken for that segment database. In the event of the max number of failed probe attempts, the FTS server will probe the mirror to ensure it is up and then update the gp_segment_configuration table marking the segment as down and transitioning the mirror to be the primary. The FTS server will also update the gp_configuration_history table with the operations performed.
When there is only an active primary segment up, and the corresponding mirror is down, the primary will go into Change Tracking Mode. In Change Tracking Mode, changes to the segment are recorded, so the mirror can be resynchronized without doing a full copy of the data from the primary to the mirror.

How to bring up failed mirror segments?

In order to bring up a down mirror, the gprecoverseg utility command is run. This command defaults to incremental recovery and puts the mirror into resync mode, thereby starting to recover the changes from the primary to a mirror. If an incremental recovery cannot be completed, the recovery will fail and the gprecoverseg command should be run again with the "-F" option. This will indicate full recovery and cause the primary to copy all the data over to the mirror to get it in sync mode.

Refer to this article for more information.

Explanation for gp_segment_configuration

The modes, change tracking, resync, insync, can be seen for each segment, as well as the status up or down in the gp_segment_configuration table. Check this article for more information.

There are also columns in gp_segment_configuration called 'role' and 'preferred_role'. These can have the values of either 'p' for primary or 'm' for the mirror. The 'role' shows the current role of a segment database and the 'preferred_role' shows the original role of the segment. In a balanced system the 'role' and 'preferred_role' will match for all segments. If they do not match, this indicates there may be a skew in the number of active primaries on each hardware host. To rebalance the segments and bring all the segments into their preferred role, the gprecoverseg command can be run with the "-r" option.

Why is the mirror being marked down?

Additionally, there is another set of events that could cause a mirror to be marked as down. As data is written from a primary segment to a mirror segment, the primary segment will detect if it is not able to send the data to its mirror pair. The data is queued up and after the 'gp_segment_connect_timeout' seconds is passed, and the primary has not been able to send more data to the mirror, the primary will indicate a mirror failure and cause the mirror to be marked down and the primary to go into change tracking mode.

FTS-related parameters

gp_fts_probe_interval

The frequency of probe loops. a loop starts every X seconds (seconds).
For example, if the setting is 60 and probe loop takes 10 seconds, it will sleep 50 seconds.
For example, if the setting is 60 and probe loop takes 75 seconds, it will sleep 0 seconds.
Default Setting: 60

gp_fts_probe_timeout

Probe timeout between master and segment (seconds).
Default Setting: 20

gp_fts_probe_retries

The number of tries to probe segment on failure.
For example, if the setting is 5 there will be 4 retries after the first attempt.
Default Setting: 5

gp_segment_connect_timeout

Maximum time (in seconds) allowed for a mirror to respond.
Default Setting: 180

gp_log_fts

Sets the verbosity of logged messages pertaining to fault probing.
Valid values are "off", "terse", "verbose" and "debug".
The verbose setting can be used in production and provides useful data for troubleshoot, debug should not be used in production.
Default Setting: terse