Typos in configuration parameters, ports already in use and some other different issues can prevent segments from coming up online. In some scenarios, the error thrown when running gpstart utility, which is not very informative.
This article describes the initial troubleshooting process to get more information about this error and fix it.
As an example, this article addresses the process of troubleshooting GPDB's failure to start one or more segments due to a bad entry in one or more postgresql.conf files across the segments.
The gpstart utility is unable to start some segments and throws an error message similar to the following when it runs:
20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:45 FAILED host:'sdw2.gphd.local' datadir:'/data1/mirror/gpseg20' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:13 FAILED host:'sdw2.gphd.local' datadir:'/data1/primary/gpseg11' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:49 FAILED host:'sdw2.gphd.local' datadir:'/data2/mirror/gpseg25' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:44 FAILED host:'sdw2.gphd.local' datadir:'/data1/mirror/gpseg21' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:12 FAILED host:'sdw2.gphd.local' datadir:'/data1/primary/gpseg10' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:16 FAILED host:'sdw2.gphd.local' datadir:'/data2/primary/gpseg14' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:46 FAILED host:'sdw2.gphd.local' datadir:'/data2/mirror/gpseg29' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:42 FAILED host:'sdw2.gphd.local' datadir:'/data1/mirror/gpseg7' with reason:'PG_CTL failed.' 20161013:08:11:21:454037 gpstart:mdw:gpadmin-[INFO]:-DBID:43 FAILED host:'sdw2.gphd.local' datadir:'/data1/mirror/gpseg6' with reason:'PG_CTL failed.'
Depending on the amount of segments that are unable to start, the database might be unable to start at all. The database will not start if both the primary and mirror segments for a specific content are down.
Gather more information about the error:
1. Check the gpstart log file (normally under /home/gpadmin/gpAdminLogs) and understand the error message that is shown. This can normally help finding which specific segments are affected.
2. Check the startup.log in /pg_logs directory of the affected segment(s) and look for clues on what happened at the time gpstart was run.
3. Check the segment logs in /pg_logs directory of the affected segment(s) to get more information about what the segment was trying to do before failing to start.
After reviewing those locations and if unsure on how to proceed, the first thing to do would be to use the Broadcom Knowledge Base to find knowledge articles that explain the issue at hand and provide a solution. An easy tip to find relevant articles using the search functionality is to copy the error message string that is shown in the logfile.
If Pivotal Support help is needed, attaching the results of the initial analysis performed, and the error messages found in the logs upon ticket creation, will be very useful and speed up the troubleshooting process.
Example: If a bad line has been added to one or more postgresql.conf files across the cluster, the startup.log will produce something similar to the following:
2016-10-13 11:31:59.399552 GMT,,,p278001,th2045376288,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 11:32:25.338118 GMT,,,p278456,th-958068960,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 11:34:50.139285 GMT,,,p279433,th1119868704,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 11:49:37.776982 GMT,,,p283593,th-170842336,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 11:49:38.095468 GMT,,,p283715,th-1851304160,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 11:55:05.388579 GMT,,,p285528,th-2014513376,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 12:01:40.237405 GMT,,,p288805,th-884390112,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369, 2016-10-13 12:02:00.935112 GMT,,,p289261,th-577505504,,,,0,,,seg-1,,,,,"FATAL","42601","syntax error in file ""/data1/mirror/gpseg20/postgresql.conf"" line 553, near token ""-""",,,,,,,,"ParseConfigFile","guc-file.l",369,
This means that there is a problem in line 553 in the postgresql.conf file of the affected segment (in this case gpseg20). Fixing this line will allow PG_CTL to successfully start this segment next time gpstart runs.