The issue displayed below occurs when starting up a database and the database is unable to start.
20160815:13:33:14:002998 gpstart:mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data/master/gpseg-1 -l /data/master/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 690 -c gp_role=utility " start' rc=1, stdout='waiting for server to start...... stopped waiting ', stderr='pg_ctl: PID file "/data/master/gpseg-1/postmaster.pid" does not exist pg_ctl: could not start server Examine the log output.
This is a result of config changes in the pg_hba.conf or the postgresql.conf files. If aware of changes made to these files, please review the changes and validate the changes.
Refer to the articles below to fix this issues:
Check the log file ${MASTER_DATA_DIRECTORY}/pg_log/startup.log (GPDB 6.x) or ${COORDINATOR_DATA_DIRECTORY}/log/startup.log (GPDB7.x) for any error messages.
Search Broadcom community or other common startup issues not mentioned above. In the case that these articles do not help and the database is still down, then contact Pivotal Support and provide the logs related to the error.
1. Check using "gpstate -e" to confirm the segments are down. In the below case, all the segments on sdw2 went down because of server replacement.
20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e 20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1' 20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14' 20160923:15:53:38:007795 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160923:15:53:39:007795 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments... .. 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Segments with Primary and Mirror Roles Switched 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- Current Primary Port Mirror Port 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50000 sdw2-1 40000 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50001 sdw2-1 40001 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50002 sdw2-1 40002 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50003 sdw2-2 40003 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50004 sdw2-2 40004 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50005 sdw2-2 40005 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:-Primaries in Change Tracking 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- Current Primary Port Change tracking size Mirror Port 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw1-2 40003 188 MB sdw2-1 50003 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw1-2 40004 219 MB sdw2-1 50004 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw1-2 40005 202 MB sdw2-1 50005 20160923:15:53:41:007795 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50005 200 MB sdw2-2 40005
2. Run the command, gprecoverseg -a
If the filesystem for down segments has been formatted OR after performing a segment server replacement with drives, incremental recovery of segments cannot be done and full recovery is needed. Make sure the appropriate filesystem partitions and parent folders (/dataX/primary /dataX/mirror) for the data directories have been created and Greenplum binaries have been installed on the server before running full recovery.
Note: Do NOT run full recovery if the filesystem has not been formatted and incremental recovery is failing. Check Broadcom community if there is an article available related to failure or contact Pivotal support using a Severity 2 classification.
gprecoverseg -F
3. Check the status of recovery using gpstate -e
gpstate -e
In the case where recovery is completed, then the following status will be displayed. If only the mirror is down, the next step can be skipped and a status similar to the one described in step 5 will be displayed.
20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e 20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1' 20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14' 20160923:16:23:30:013261 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160923:16:23:31:013261 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments... .. 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:-Segments with Primary and Mirror Roles Switched 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- Current Primary Port Mirror Port 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50000 sdw2-1 40000 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50001 sdw2-1 40001 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-2 50002 sdw2-1 40002 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50003 sdw2-2 40003 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50004 sdw2-2 40004 20160923:16:23:34:013261 gpstate:mdw:gpadmin-[INFO]:- sdw3-1 50005 sdw2-2 40005
4. Rebalance segments using the gprecoverseg -ra
command.
gprecoverseg -ra
Note: A rebalance will cancel all running transactions. Make sure to schedule a window for running this command if running jobs cannot be canceled.
5. Check the status of recovery during rebalance using the below command:
gpstate -e
When no segments are left to recover or rebalance, the end status from gpstate -e
will be displayed.
20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e 20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1' 20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.5.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on May 14 2015 14:07:14' 20160923:17:00:08:065488 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160923:17:00:11:065488 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments... .. 20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report 20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160923:17:00:14:065488 gpstate:mdw:gpadmin-[INFO]:-All segments are running normally
If root cause analysis is needed for a down segment, then provide the tar archive generated from the below command to Pivotal support. Refer to gpmt for more information:
gpmt gp_log_collector -failed-segs -start 2016-09-23
Note: Change the date used above to the date that your segments went down.