gpbackup fails with "error exit status 255: Authentication failed"
search cancel

gpbackup fails with "error exit status 255: Authentication failed"

book

Article ID: 296289

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

The gpbackup process fails at different stages of the backup operation, such as creating the backup directories, with errors:
20191001:10:11:38 gpbackup:gpadmin:segmenthost01.local:066040-[DEBUG]:-Unable to create backup directory /data/primary/gpseg23/backups/20191001/20191001101133 on segment 23 on host segmenthost01.local with error exit status 255: Authentication failed.
20191001:10:11:38 gpbackup:gpadmin:segmenthost01.local:066040-[DEBUG]:-Command was: ssh -o StrictHostKeyChecking=no [email protected] mkdir -p /data/primary/gpseg23/backups/20191001/20191001101133
You may see the following errors during the cleanup stages:
20191001:10:06:21 gpbackup:gpadmin:segmenthost01.local:063150-[DEBUG]:-Unable to execute command: cleanup_plugin_for_backup at: /usr/local/greenplum-db/./bin/gpbackup_s3_plugin, on: segment on segment 28 on host segmenthost01.local with error exit status 255: Authentication failed.
20191001:10:06:21 gpbackup:gpadmin:segmenthost01.local:063150-[DEBUG]:-Command was: ssh -o StrictHostKeyChecking=no [email protected] source /usr/local/greenplum-db/./greenplum_path.sh && /usr/local/greenplum-db/./bin/gpbackup_s3_plugin cleanup_plugin_for_backup /tmp/20191001100548_s3_backup.yaml /data2/primary/gpseg28/backups/20191001/20191001100548 segment \"28\"
20191001:10:06:21 gpbackup:gpadmin:segmenthost01.local:063150-[ERROR]:-Unable to execute command: cleanup_plugin_for_backup at: /usr/local/greenplum-db/./bin/gpbackup_s3_plugin, on: segment
However, if you manually running the commands reported above, they succeed:
ssh -o StrictHostKeyChecking=no [email protected] mkdir -p /data/primary/gpseg23/backups/20191001/20191001101133
Modifying MaxStartups on the segment hosts doesn't solve this issue either.

Environment

Product Version: 5.7

Resolution

1. Check the /var/log/secure logs on the segment hosts for possible causes for the authentication failure.

In this particular case, we were getting the following errors:
Oct  9 10:36:10 datanode01 sshd[49954]: pam_sss(sshd:account): Access denied for user gpadmin: 4 (System error)
Oct  9 10:36:10 datanode01 sshd[50427]: Received disconnect from 10.1.1.1 port 55132:11: disconnected by user
Oct  9 10:36:10 datanode01 sshd[50427]: Disconnected from 10.1.1.1 port 55132
Oct  9 10:36:10 datanode01 sshd[49954]: fatal: Access denied for user gpadmin by PAM account configuration [preauth]
Oct  9 10:36:10 datanode01 sshd[49900]: pam_unix(sshd:session): session closed for user gpadmin
2. The issue was being reported by sss daemon when performing PAM authentication.

3. Disabling PAM authentication would fix the issue, but if PAM authentication cannot be disabled, the following parameter can be specified in /etc/sssd/sssd.conf:
selinux_provider=none
4. This is due to a bug reported for some Linux builds for heavily loaded clients, even if a client has selinux disabled, it was found that ssh access is still randomly denied because of selinux failures. You need to explicitly add selinux_provider=none to sssd.conf to avoid seeing these.

That is why the failure occurred during gpbackup execution. This failure occurs when multiple ssh commands are run at the same time but not when manually running the command.

Note: This requirement for configuring the system for GPDB installation will be added to the Greenplum documentation.