After executing 'gpstop -M fast', a postgres core is generated by the writer process. The database completes its shutdown. The following error is received:
Error Message:
Postgres core is generated with the following backtrace:
The core was generated by `postgres: port 16729, writer process '. Program terminated with signal SIGABRT, Aborted. #0 0x00002b5307d52495 in raise () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6 (gdb) thread apply all bt Thread 1 (LWP 86993): #0 0x00002b5307d52495 in raise () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6 #1 0x00002b5307d53c75 in abort () from /data/logs/69603/packcore-postgres.86993.181669.core/lib64/libc.so.6 #2 0x0000000000b05381 in errfinish (dummy=<optimized out>) at elog.c:689 #3 0x0000000000b06e39 in elog_finish (elevel=<optimized out>, fmt=<optimized out>) at elog.c:1466 #4 0x000000000095f919 in proc_exit_prepare (code=<optimized out>) at ipc.c:155 #5 proc_exit (code=0) at ipc.c:95 #6 0x0000000000c06560 in FileRepPrimary_IsMirroringRequired (fileRepRelationType=FileRepRelationTypeFlatFile, fileRepOperation=FileRepOperationWrite) at cdbfilerepprimary.c:253 #7 0x0000000000c06c4f in FileRepPrimary_MirrorWrite (fileRepIdentifier=..., fileRepRelationType=86993, offset=42958848, data=0x6 <error: Cannot access memory at address 0x6>, dataLength=4294967295, lsn=...) at cdbfilerepprimary.c:863 #8 0x0000000000c575aa in MirroredFlatFile_Write (open=0x1223b60 <mirroredLogFileOpen>, position=42958848, buffer=<optimized out>, bufferLen=32768, suppressError=<optimized out>) at cdbmirroredflatfile.c:648 #9 0x000000000055559a in XLogWrite (WriteRqst=..., flexible=<optimized out>, xlog_switch=<optimized out>) at xlog.c:2184 #10 0x0000000000556b59 in XLogFlush (record=...) at xlog.c:2406 #11 0x0000000000943ef3 in FlushBuffer (buf=0x2b531020bc20, reln=0x31ecff8) at bufmgr.c:2397 #12 0x0000000000946e5f in SyncOneBuffer (skip_pinned=<optimized out>, buf_id=<optimized out>) at bufmgr.c:2111 #13 BgBufferSync () at bufmgr.c:2044 #14 0x00000000008e0ed5 in BackgroundWriterMain () at bgwriter.c:344 #15 0x00000000005f5ff5 in AuxiliaryProcessMain (argc=-4, argv=0x7fff92e2d6a0) at bootstrap.c:483 #16 0x00000000008ee4e4 in StartChildProcess (type=<optimized out>) at postmaster.c:7992 #17 0x00000000008f459a in CommenceNormalOperations () at postmaster.c:4543 #18 0x00000000008f66ca in do_reaper () at postmaster.c:4980 #19 0x00000000008f9598 in ServerLoop () at postmaster.c:2437 #20 0x00000000008faf10 in PostmasterMain (argc=15, argv=0x31bd4a0) at postmaster.c:1540 #21 0x00000000007fcf7f in main (argc=15, argv=0x31bd430) at main.c:206
The following error messages may be noted in the gpstop logs:
20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-There are 157 connections to the database 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='fast' 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Master host=mdw.randolph.ms.com 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Detected 157 connections to database 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Switching to WAIT mode 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Will wait for shutdown to complete, this may take some time if 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-there are a large number of active complex transactions, please wait... 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=fast 20171030:13:49:32:721637 gpstop:mdw:gpadmin-[INFO]:-Master segment instance directory=/var/gpdb/nypgp014/datamaster/gpseg-1 20171030:13:51:33:721637 gpstop:mdw:gpadmin-[INFO]:-Failed to shutdown master with pg_ctl. 20171030:13:51:33:721637 gpstop:mdw:gpadmin-[INFO]:-Sending SIGQUIT signal... <<<< 20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process 20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Terminating processes for segment /var/gpdb/nypgp014/datamaster/gpseg-1 20171030:13:51:38:721637 gpstop:mdw:gpadmin-[INFO]:-Stopping master standby host idb102.randolph.ms.com mode=fast 20171030:13:51:45:721637 gpstop:mdw:gpadmin-[INFO]:-Successfully shutdown standby process on idb102.randolph.ms.com 20171030:13:51:45:721637 gpstop:mdw:gpadmin-[INFO]:-Commencing parallel primary segment instance shutdown, please wait...
When a gpstop -M fast is executed, all remaining queries on the master must complete before the "WAIT" timer expires. If these queries are not completed, they will be forcefully shut down. This could mean that queries are canceled before they finish replicating or before the replication is logged. The writer process will issue a SIGABORT, which will cause a core to be generated if it does not have the confirmation of completed mirror replication.
If you believe you have encountered this issue, please open a ticket with Pivotal Support.
A defect has been opened with Pivotal Engineering to address this issue.