gprecoverseg utilizes gp_primarymirror to check the primary and the mirror segment states in order to determine if the segments are "ready" for recovery.
Example:
20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False stdout='' stderr='mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InSync faultType: NotInitialized
gprecoverseg expects the value to be returned in stderr and in a specific format, where it parses the necessary values out.
If gprecoverseg encounters additional information returned by SSH, it may assume that the segment is not "Ready" and retry the operation several times before finally terminating it.
In this scenario, some X11 forwarding changes had been made to the gpadmin profile on several segments, causing various errors to be returned:
- expected response -
20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='mode: PrimarySegment
...
- error message 1 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker7] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb02 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb02 -p 50003"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='/usr/bin/xauth: error in locking authority file /home/gpadmin/.Xauthority
- error message 2 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker6] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40002"' had result: cmd had rc=1 completed=True halted=False
stdout=''
stderr='Warning: No xauth data; using fake authentication data for X11 forwarding.
- exception -
20161005:10:42:06:062639 gprecoverseg:ecdlnjqgrpms01:gpadmin-[ERROR]:-gprecoverseg failed. exiting...
Traceback (most recent call last):
File "/bb/gpdata/greenplum-db/lib/python/gppylib/mainUtils.py", line 281, in simple_main_locked
exitCode = commandObject.run()
File "/bb/gpdata/greenplum-db/lib/python/gppylib/programs/clsRecoverSegment.py", line 1266, in run
raise Exception("Inconsistency in catalog and segment Role/Mode. Catalog Role = %s. Segment Mode = %s." % (db.getSegmentRole(), mode))
Exception: Inconsistency in catalog and segment Role/Mode. Catalog Role = p. Segment Mode = error in locking authority file /home/gpadmin/.Xauthority.
For this issue, modify the SSH config to disable X11 via ~/.ssh/config:
Host * ForwardAgent no ForwardX11 no