Note: The error "No such file or directory" may be reported for a number of different reasons. This article describes only one. Please refer to other articles if the symptoms do not match.
20220529:10:18:40:030226 gprecoverseg:e9a331119f23268:gpadmin-[INFO]:-Starting gprecoverseg with args: -aF (...) 20220529:10:19:01:030226 gprecoverseg:e9a331119f23268:gpadmin-[INFO]:-Updating configuration to mark mirrors up 20220529:10:19:01:030226 gprecoverseg:e9a331119f23268:gpadmin-[INFO]:-Updating primaries 20220529:10:19:01:030226 gprecoverseg:e9a331119f23268:gpadmin-[INFO]:-Commencing parallel primary conversion of 1 segments, please wait... 20220529:10:34:46:030226 gprecoverseg:e9a331119f23268:gpadmin-[INFO]:-Process results... 20220529:10:34:46:030226 gprecoverseg:e9a331119f23268:gpadmin-[WARNING]:-Failed to inform primary segment of updated mirroring state. Segment: 8d7f44501e602d8:/data3/primary/gpseg60:content=60:dbid=62:mode=r:status=u: REASON: Conversion failed. stdout:"" stderr:"peer shut down connection before response was fully received Retrying no 1 failure: Error: MirroringFailure failure: Error: MirroringFailure "- the above error is a result of recovered target (down segment that is being recovered) to shut down because filerep is failing.
2022-05-29 10:24:44.230248 UTC,,,p10991,th-294840448,,,2000-01-01 00:00:00 UTC,0,,,seg-1,,,,,"WARNING","XX000","receive EOF on connection: Success (cdbfilerepconnserver.c:333)",,,,,,,0,,"cdbfilerepconnserver.c",333, 2022-05-29 10:24:44.304185 UTC,,,p10993,th-294840448,,,,0,,,seg-1,,,,,"LOG","00000","'set segment state', mirroring role 'mirror role' mirroring state 'resync' segment state 'in fault' filerep state 'fault' process name(pid) 'mirror consumer writer process(10993)' 'cdbfilerep.c' 'L2444' 'FileRep_SetSegmentState'",,,,,,,0,,"cdbfilerep.c",1824,"Stack trace: 1 0x96600b postgres errstart (elog.c:521) 2 0xa17ff4 postgres FileRep_InsertConfigLogEntryInternal (cdbfilerep.c:1806) 3 0xa1b67c postgres FileRepSubProcess_SetState (cdbfilerepservice.c:555) 4 0xa1b952 postgres FileRepSubProcess_ProcessSignals (cdbfilerepservice.c:281)- the filerep on mirror shut down because the primary segment (recovery source) has a missing file and PANICs
2022-05-29 10:24:44.222589 UTC,,,p15891,th-1524660352,,,,0,,,seg-1,,,,,"PANIC","58P01","could not open relation 1663/2379506/6450393: No such file or directory",,,,,,,0,,"md.c",1478,"Stack trace: 1 0x96600b postgres errstart (elog.c:521) 2 0x832e3f postgres <symbol not found> (md.c:1471) 3 0x8343ea postgres mdnblocks (md.c:1651) 4 0xa8d0fb postgres PersistentFileSysObj_MarkWholeMirrorFullCopy (cdbpersistentfilesysobj.c:4488) 5 0xa2e0e3 postgres FileRepPrimary_StartResyncManager (cdbfilerepresyncmanager.c:902) 6 0xa1bf08 postgres FileRepSubProcess_Main (cdbfilerepservice.c:867) 7 0xa16426 postgres <symbol not found> (cdbfilerep.c:2667) 8 0xa1b105 postgres FileRep_Main (cdbfilerep.c:3571) 9 0x58a7e3 postgres AuxiliaryProcessMain (bootstrap.c:513) 10 0x7d85db postgres <symbol not found> (postmaster.c:7406) 11 0x7dcf5c postgres StartFilerepProcesses (postmaster.c:1622) 12 0x7e68b9 postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1760) 13 0x7e1681 postgres <symbol not found> (postmaster.c:2465) 14 0x7e38ca postgres PostmasterMain (postmaster.c:1533) 15 0x4cdce7 postgres main (main.c:206) 16 0x7fb5a06fe555 libc.so.6 __libc_start_main + 0xf5 17 0x4ce29c postgres <symbol not found> + 0x4ce29c "
$ PGOPTIONS='-c gp_session_role=utility' psql -h <primary_segment_host> -p <primary_segment_port> psql (8.3.23) Type "help" for help. gpadmin=# SELECT oid,datname from pg_database where oid = 2379506; oid | datname -------+--------- 2379506 | prod (1 row) gpadmin=# \c prod You are now connected to database "prod" as user "gpadmin". prod=# SELECT * from pg_class where relfilenode = 6450393; (0 rows)This shows that the relfilenode does not exist in a catalog. The recovery process is trying to synchronize a file that does not exist on the filesystem and does not belong to any relation which causes the gprecoverseg to fail.
gpadmin=# SELECT * from gp_persistent_relation_node where database_oid=2379506 and relfilenode_oid=6450393; (0 rows)
$ PGOPTIONS='-c gp_session_role=utility' psql -h <primary_segment_host> -p <primary_segment_port> -d <database_name> # REINDEX TABLE pg_class; # REINDEX TABLE pg_type; # REINDEX TABLE pg_attribute; # VACUUM pg_class; # VACUUM pg_type; # VACUUM pg_attribute; # ANALYZE pg_class; # ANALYZE pg_type; # ANALYZE pg_attribute; # REINDEX TABLE pg_class; # REINDEX TABLE pg_type; # REINDEX TABLE pg_attribute;