The first thing you need to do is make sure all the processes are stopped on all the segments. The next thing you need to do is do gpstart -am to see what segments are down in configuration. Next, after you see that go to one of the down segments and check the primary and mirror logs for panics that look similar to this:
2020-10-16 22:03:49.336450 GMT,,,p45140,th-443459712,,,2000-01-01 00:00:00 GMT,0,,,seg-1,,,,,“WARNING”,“XX000”,“receive EOF on connection: Success (cdbfilerepconnserver.c:333)“,,,,,,,0,,“cdbfilerepconnserver.c”,333, 2020-10-16 22:03:49.400075 GMT,,,p45147,th-443459712,,,,0,,,seg-1,,,,,“LOG”,“00000”,“‘set segment state’, mirroring role ‘mirror role’ mirroring state ‘resync’ segment state ‘in fault’ filerep state ‘fault’ process name(pid) ‘mirror consumer append only process(45147)’ ‘cdbfilerep.c’ ‘L2444’ ‘FileRep_SetSegmentState’“,,,,,,,0,,“cdbfilerep.c”,1824,“Stack trace: 1 0x962a4b postgres errstart (elog.c:521) 2 0xa14954 postgres FileRep_InsertConfigLogEntryInternal (cdbfilerep.c:1806) 3 0xa17fdc postgres FileRepSubProcess_SetState (cdbfilerepservice.c:555) 4 0xa182b2 postgres FileRepSubProcess_ProcessSignals (cdbfilerepservice.c:281) 5 0xa22a4a postgres <symbol not found> (cdbfilerepmirror.c:660) 6 0xa262c8 postgres FileRepMirror_StartConsumer (cdbfilerepmirror.c:543) 7 0xa18755 postgres FileRepSubProcess_Main (cdbfilerepservice.c:826) 8 0xa12d86 postgres <symbol not found> (cdbfilerep.c:2667) 9 0xa17c59 postgres FileRep_Main (cdbfilerep.c:3620) 10 0x588e33 postgres AuxiliaryProcessMain (bootstrap.c:513) 11 0x7d5a9b postgres <symbol not found> (postmaster.c:7395) 12 0x7da41c postgres StartFilerepProcesses (postmaster.c:1622) 13 0x7e3f1f postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1760) 14 0x7deae1 postgres <symbol not found> (postmaster.c:2465) 15 0x7e0d2a postgres PostmasterMain (postmaster.c:1533) 16 0x4cbab7 postgres main (main.c:206) 17 0x7f7be0e48545 libc.so.6 __libc_start_main + 0xf5 18 0x4cc06c postgres <symbol not found> + 0x4cc06c
The corresponding mirror:
2020-10-16 22:01:43.198151 GMT,,,p42438,th1885906816,,,,0,,,seg-1,,,,,“LOG”,“00000",“’process exit, process name ‘primary resync worker #2 process’ process pid ‘42477’ exit status ‘0’ ’, mirroring role ‘primary role’ mirroring state ‘resync’ segment state ‘in backends shutdown’ filerep state ‘not initialized’ process name(pid) ‘filerep main process(42438)’ ‘cdbfilerep.c’ ‘L2114’ ‘LogChildExit’“,,,,,,,0,,“cdbfilerep.c”,1824, 2020-10-16 22:01:43.298282 GMT,,,p42438,th1885906816,,,,0,,,seg-1,,,,,"LOG","00000","'set segment state', mirroring role 'primary role' mirroring state 'resync' segment state 'in backends shutdown' filerep state 'not initialized' process name(pid) 'filerep main process(42438)' 'cdbfilerep.c' 'L2444' 'FileRep_SetSegmentState'",,,,,,,0,,"cdbfilerep.c",1824,"Stack trace: 1 0x962a4b postgres errstart (elog.c:521) 2 0xa14954 postgres FileRep_InsertConfigLogEntryInternal (cdbfilerep.c:1806) 3 0xa15234 postgres <symbol not found> (cdbfilerep.c:2444) 4 0xa17257 postgres FileRep_Main (cdbfilerep.c:3647) 5 0x588e33 postgres AuxiliaryProcessMain (bootstrap.c:513) 6 0x7d5a9b postgres <symbol not found> (postmaster.c:7395) 7 0x7da41c postgres StartFilerepProcesses (postmaster.c:1622) 8 0x7e3d19 postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1760) 9 0x7deae1 postgres <symbol not found> (postmaster.c:2465) 10 0x7e0d2a postgres PostmasterMain (postmaster.c:1533) 11 0x4cbab7 postgres main (main.c:206) 12 0x7fb86bbbd545 libc.so.6 __libc_start_main + 0xf5 13 0x4cc06c postgres <symbol not found> + 0x4cc06c
The bolded line in the stack points to the issue of doing a gpstop during a gprecoverseg.
Product Version: 5.24
If they are the same, you need to modify the gp_seegment_configuration table.
First take a backup:
copy gp_segment_configuration to ‘/tmp/gp_segm.out’;
Then, do the following:
set allow_system_table_mods=dml;
Then, do the following:
select * from gp_segment_configuration where role <> preferred role; begin; update gp_segment_configuration set status=‘d’, mode=‘s’ where dbid in (15,24,25); update gp_segment_configuration set mode=‘c’ where dbid in (39,48,49); commit;
Finally, do a gptop gpstart.
Decide if you need a full recovery or incremental recovery by the size of the pg_changetracking.