gprecoverseg fails due to X11 Forwarding
search cancel

gprecoverseg fails due to X11 Forwarding

book

Article ID: 295630

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Symptoms:

gprecoverseg utilizes gp_primarymirror to check the primary and the mirror segment states in order to determine if the segments are "ready" for recovery.


Example:

20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False
 stdout=''
stderr='mode: PrimarySegment
segmentState: Ready  
dataState: InSync
faultType: NotInitialized
mode: PrimarySegment
segmentState: Ready
dataState: InSync
faultType: NotInitialized

gprecoverseg expects the value to be returned in stderr and in a specific format, where it parses the necessary values out.

If gprecoverseg encounters additional information returned by SSH, it may assume that the segment is not "Ready" and retry the operation several times before finally terminating it.

 

Environment


Cause

In this scenario, some X11 forwarding changes had been made to the gpadmin profile on several segments, causing various errors to be returned:

- expected response -
20161005:10:42:00:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker4] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40001"' had result: cmd had rc=1 completed=True halted=False
 stdout=''
stderr='mode: PrimarySegment
...

- error message 1 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker7] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb02 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb02 -p 50003"' had result: cmd had rc=1 completed=True halted=False
 stdout=''
stderr='/usr/bin/xauth: error in locking authority file /home/gpadmin/.Xauthority

- error message 2 -
20161005:10:42:01:062639 gprecoverseg:xxx:gpadmin-[DEBUG]:-[worker6] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' ecdlnjqgrpdb01 ". /bb/gpdata/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h ecdlnjqgrpdb01 -p 40002"' had result: cmd had rc=1 completed=True halted=False
 stdout=''
stderr='Warning: No xauth data; using fake authentication data for X11 forwarding.

- exception -
20161005:10:42:06:062639 gprecoverseg:ecdlnjqgrpms01:gpadmin-[ERROR]:-gprecoverseg failed. exiting...
Traceback (most recent call last):
 File "/bb/gpdata/greenplum-db/lib/python/gppylib/mainUtils.py", line 281, in simple_main_locked
 exitCode = commandObject.run()
 File "/bb/gpdata/greenplum-db/lib/python/gppylib/programs/clsRecoverSegment.py", line 1266, in run
 raise Exception("Inconsistency in catalog and segment Role/Mode. Catalog Role = %s. Segment Mode = %s." % (db.getSegmentRole(), mode))
Exception: Inconsistency in catalog and segment Role/Mode. Catalog Role = p. Segment Mode = error in locking authority file /home/gpadmin/.Xauthority.

Resolution

For this issue, modify the SSH config to disable X11 via ~/.ssh/config:

Host *
 ForwardAgent no
 ForwardX11 no