Greenplum Database (GPDB) start issues and segment recovery

Products

VMware Tanzu Greenplum Greenplum VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

Symptoms:

Common issue starting GPDB

The issue displayed below occurs when starting up a database and the database is unable to start.

20160815:13:33:14:002998 gpstart:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 
Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/master/gpseg-1 -l /data/master/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 690 -c gp_role=utility " start' 
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/data/master/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.

Environment

Cause

This is a result of config changes in the pg_hba.conf or the postgresql.conf files. If aware of changes made to these files, please review the changes and validate the changes.

Resolution

Refer to the articles below to fix this issues:

Check the log file ${MASTER_DATA_DIRECTORY}/pg_log/startup.log (GPDB 6.x) or ${COORDINATOR_DATA_DIRECTORY}/log/startup.log (GPDB7.x) for any error messages.

Search Broadcom community or other common startup issues not mentioned above. In the case that these articles do not help and the database is still down, then contact Pivotal Support and provide the logs related to the error.

gpstate shows segments down

1. Check using "gpstate -e" to confirm the segments are down. In the below case, all the segments on sdw2 went down because of server replacement.

20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1'
20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14'
20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20160923:15:53:39:007795 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
.. 
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Segments with Primary and Mirror Roles Switched
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   Current Primary   Port    Mirror   Port
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50000   sdw2-1   40000
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50001   sdw2-1   40001
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50002   sdw2-1   40002
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50003   sdw2-2   40003
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50004   sdw2-2   40004
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50005   sdw2-2   40005
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Primaries in Change Tracking
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   Current Primary   Port    Change tracking size   Mirror   Port
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw1-2            40003   188 MB                 sdw2-1   50003
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw1-2            40004   219 MB                 sdw2-1   50004
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw1-2            40005   202 MB                 sdw2-1   50005
20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50005   200 MB                 sdw2-2   40005

2. Run the command, gprecoverseg -a

If the filesystem for down segments has been formatted OR after performing a segment server replacement with drives, incremental recovery of segments cannot be done and full recovery is needed. Make sure the appropriate filesystem partitions and parent folders (/dataX/primary /dataX/mirror) for the data directories have been created and Greenplum binaries have been installed on the server before running full recovery.

Note: Do NOT run full recovery if the filesystem has not been formatted and incremental recovery is failing. Check Broadcom community if there is an article available related to failure or contact Pivotal support using a Severity 2 classification.

gprecoverseg -F

3. Check the status of recovery using gpstate -e

gpstate -e

In the case where recovery is completed, then the following status will be displayed. If only the mirror is down, the next step can be skipped and a status similar to the one described in step 5 will be displayed.

20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1'
20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14'
20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20160923:16:23:31:013261 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
.. 
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-Segments with Primary and Mirror Roles Switched
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   Current Primary   Port    Mirror   Port
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50000   sdw2-1   40000
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50001   sdw2-1   40001
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-2            50002   sdw2-1   40002
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50003   sdw2-2   40003
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50004   sdw2-2   40004
20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-   sdw3-1            50005   sdw2-2   40005

4. Rebalance segments using the gprecoverseg -ra command.

gprecoverseg -ra

Note: A rebalance will cancel all running transactions. Make sure to schedule a window for running this command if running jobs cannot be canceled.

5. Check the status of recovery during rebalance using the below command:

gpstate -e

When no segments are left to recover or rebalance, the end status from gpstate -e will be displayed.

20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1'
20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14'
20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master...
20160923:17:00:11:065488 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
.. 
20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-All segments are running normally

If root cause analysis is needed for a down segment, then provide the tar archive generated from the below command to Pivotal support. Refer to gpmt for more information:

gpmt gp_log_collector -failed-segs -start 2016-09-23

Note: Change the date used above to the date that your segments went down.