The following error message is outputted from the standby master pg_log:
2017-12-07 16:19:21.179732 GMT+5,,,p82774,th-1443289280,,,,0,,,seg-1,,,,,"ERROR","XX000","could not receive data from WAL stream: server closed the connection unexpectedly (gp_libpqwalreceiver.c:371)"," This probably means the server terminated abnormally before or while processing the request.",,,,,,0,,"gp_libpqwalreceiver.c",371,"Stack trace: 1 0xb0a80e postgres errstart + 0x4de 2 0x931354 postgres walrcv_receive + 0x2b4 3 0x936d5b postgres WalReceiverMain + 0x45b 4 0x5f7068 postgres AuxiliaryProcessMain + 0x818 5 0x8ef2f4 postgres <symbol not found> + 0x8ef2f4 6 0x8f315c postgres <symbol not found> + 0x8f315c 7 0x7fe2a64537e0 libpthread.so.0 <symbol not found> + 0xa64537e0 8 0x7fe2a59eb503 libc.so.6 __select + 0x13 9 0x8f9178 postgres <symbol not found> + 0x8f9178 10 0x8fc4c0 postgres PostmasterMain + 0xff0 11 0x7fdc9f postgres main + 0x44f 12 0x7fe2a5928d1d libc.so.6 __libc_start_main + 0xfd 13 0x4bfeb9 postgres <symbol not found> + 0x4bfeb9
The below error message is outputted from the master pg_log:
2017-12-08 14:14:17.582087 EST,"gpadmin",,p480717,th1272289088,"172.28.8.251","30785",2017-12-08 14:10:17 EST,0,con85,,seg-1,,,,,"LOG","00000","terminating walsender process due to replication timeout",,,,,,,0,,"walsender.c",703,
For Greenplum Database (GPDB) master mirroring, the replication_timeout
GUC sets the maximum wait time in milliseconds that the walsender process on the active master waits for a status message from the walreceiver process on the standby master.
If a message is not received, the walsender logs an error message.
In some cases, a viable workaround would be to increase the value of the replication_timeout
GUC (in milliseconds). Note that the default value for this GUC is one minute.
1. Check the current value of the GUC:
$ gpconfig -s replication_timeout Values on all segments are consistent GUC : replication_timeout Master value: 1min Segment value: 1min
2. Next, change the value to a higher value. For example, 90 seconds:
$ gpconfig -c replication_timeout -v 90000ms 20171208:15:26:18:014230 gpconfig:mdw:gpadmin-[INFO]:-completed successfully with parameters '-c replication_timeout -v 90000ms'
3. Issue a gpstop -u
to pick up the changes during runtime:
$ gpstop -u 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Starting gpstop with args: -u 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.18.0 build 1' 20171208:15:26:23:014549 gpstop:mdw:gpadmin-[INFO]:-Signalling all postmaster processes to reload
4. Finally, check to make sure the changes are successful:
$ gpconfig -s replication_timeout Values on all segments are consistent GUC : replication_timeout Master value: 90s Segment value: 90s
Note: The above workaround may not always work for this particular error. Please reach out to Pivotal Support for further assistance if the above workaround does not help you with this error.