Gprecoverseg
fails with the following error (this can be the same for gpstart
):
20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery type = Standard 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery 1 of 1 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Synchronization mode = Incremental 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance host = sdw15 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance address = sdw15cc 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance directory = /data2/primary/gpseg77 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance port = 1030 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance replication port = 1094 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance host = sdw18 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance address = sdw18cc 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance directory = /data1/mirror/gpseg77 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance port = 1153 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance replication port = 1217 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Target = in-place 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:-1 segment(s) to recover 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped 20200405:03:12:25:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments 20200405:03:12:31:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Updating configuration with new mirrors 20200405:03:12:31:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Updating mirrors 20200405:03:12:37:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Starting mirrors 20200405:03:12:37:012055 gprecoverseg:mdw:gpadmin-[INFO]:-era is 05f98c23bc49c962_200405025756 20200405:03:12:38:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... 20200405:03:22:44:012055 gprecoverseg:mdw:gpadmin-[INFO]:-Process results... 20200405:03:22:44:012055 gprecoverseg:mdw:gpadmin-[WARNING]:-Failed to start segment. The fault prober will shortly mark it as down. Segment: sdw15:/data2/primary/gpseg77:content=77:dbid=128:mode=r:status=d: REASON: PG_CTL failed.
/data2/primary/gpseg77
on sdw15
host,
2020-04-05 03:39:02.807658 EDT,,,p17621,th702179104,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,
2020-04-05 03:39:02.809065 EDT,,,p17621,th702179104,,,,0,,,seg-1,,,,,"LOG","XX000","could not bind IPv4 socket: Address already in use (pqcomm.c:456)",,"Is another postmaster already running on port 1030? If not, wait a few seconds and retry.",,,,,,"StreamServerPort","pqcomm.c",456,
2020-04-05 03:39:02.809984 EDT,,,p17621,th702179104,,,,0,,,seg-1,,,,,"LOG","XX000","could not create IPv6 socket: Address family not supported by protocol (pqcomm.c:390)",,,,,,,,"StreamServerPort","pqcomm.c",390,
2020-04-05 03:39:02.810163 EDT,,,p17621,th702179104,,,,0,,,seg-1,,,,,"WARNING","01000","could not create listen socket for ""*""",,,,,,,,"PostmasterMain","postmaster.c",1361,
2020-04-05 03:39:02.810291 EDT,,,p17621,th702179104,,,,0,,,seg-1,,,,,"FATAL","XX000","could not create any TCP/IP sockets (postmaster.c:1366)",,,,,,,,"PostmasterMain","postmaster.c",1366,1 0xb0d20e postgres errstart (elog.c:502)
2 0x8fed37 postgres PostmasterMain (postmaster.c:1365)
3 0x7feb6f postgres main (main.c:206)
4 0x3186c1ed20 libc.so.6 __libc_start_main + 0x100
5 0x4bee79 postgres <symbol not found> + 0x4bee79
There might be many reasons why segment is unable to start. One of them is shown in the startup.log
above - "could not bind IPv4 socket: Address already in use
".Is another postmaster already running on port 1030?
"1030
port is in use:
[gpadmin@sdw15 pg_log]$ netstat -nalp | grep 1030 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 127.0.0.1:1030 0.0.0.0:* LISTEN -As we can see above the port is in fact used by another process, but since we run
netstat
as gpadmin not as root, we don't know what it is. If we run the same command as root:
[root@sdw15 ~]# netstat -nalp | grep 1030 tcp 0 0 127.0.0.1:1030 0.0.0.0:* LISTEN 10741/vnetdNow we can see it is process
10741 vnetd
. To fix the problem this process should be terminated to release port 1030
for a segment. For a permanent solution an OS admin should be informed to reconfigure any service that is using a Greenplum port range to avoid this situation in a future.