If a segment host is down,
gprecoverseg will fail with:
Error occurred: Failed while trying to remove postmaster.pid
This error occurs even if there are other segments that can be recovered.
Extract from the
gprecoverseg command in this scenario:
20201005:15:46:43:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-3 segment(s) to recover
20201005:15:46:43:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-Ensuring 3 failed segment(s) are stopped
20201005:15:46:46:004172 gprecoverseg:gpdb10:gpadmin-[WARNING]:-Unable to determine if /data6/mirror/gpseg0 is symlink. Assuming it is not symlink
20201005:15:46:52:004172 gprecoverseg:gpdb10:gpadmin-[WARNING]:-Unable to determine if /data6/primary/gpseg1 is symlink. Assuming it is not symlink
20201005:15:46:56:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20201005:15:46:58:004172 gprecoverseg:gpdb10:gpadmin-[ERROR]:-ExecutionError: 'non-zero rc: 255' occurred. Details: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 gpdb12 ". /usr/local/greenplum-db-6.11.1/greenplum_path.sh; $GPHOME/sbin/gpoperation.py"' cmd had rc=255 completed=True halted=False
stdout=''
stderr='ssh: connect to host gpdb12 port 22: No route to host
'
Traceback (most recent call last):
File "/usr/local/greenplum-db-6.11.1/lib/python/gppylib/commands/base.py", line 278, in run
self.cmd.run()
File "/usr/local/greenplum-db-6.11.1/lib/python/gppylib/operations/__init__.py", line 53, in run
self.ret = self.execute()
File "/usr/local/greenplum-db-6.11.1/lib/python/gppylib/operations/utils.py", line 50, in execute
cmd.run(validateAfter=True)
File "/usr/local/greenplum-db-6.11.1/lib/python/gppylib/commands/base.py", line 561, in run
self.validate()
File "/usr/local/greenplum-db-6.11.1/lib/python/gppylib/commands/base.py", line 609, in validate
raise ExecutionError("non-zero rc: %d" % self.results.rc, self)
ExecutionError: ExecutionError: 'non-zero rc: 255' occurred. Details: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 gpdb12 ". /usr/local/greenplum-db-6.11.1/greenplum_path.sh; $GPHOME/sbin/gpoperation.py"' cmd had rc=255 completed=True halted=False
stdout=''
stderr='ssh: connect to host gpdb12 port 22: No route to host
'
20201005:15:46:58:004172 gprecoverseg:gpdb10:gpadmin-[WARNING]:-Unable to clean up shared memory for stopped segments on host (gpdb12)
20201005:15:46:58:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-Updating configuration with new mirrors
20201005:15:46:58:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-Updating mirrors
20201005:15:46:58:004172 gprecoverseg:gpdb10:gpadmin-[INFO]:-Running pg_rewind on required mirrors
20201005:15:47:01:004172 gprecoverseg:gpdb10:gpadmin-[CRITICAL]:-Error occurred: Failed while trying to remove postmaster.pid.
Command was: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 gpdb12 ". /usr/local/greenplum-db-6.11.1/greenplum_path.sh; rm -f /data6/primary/gpseg1/postmaster.pid"'
rc=255, stdout='', stderr='ssh: connect to host gpdb12 port 22: No route to host
'