gpssh to the hosts sometimes fails to connect to some host and throws out the message "[ERROR] unable to login to <hostname>"
Example:
[gpadmin@mdw ~]$ gpssh -f ~/gpconfigs/hostfile "hostname" [ERROR] unable to login to sdw2 [ERROR] unable to login to sdw8 hint: use gpssh-exkeys to setup public-key authentication between hosts [sdw4] sdw4 [sdw5] sdw5 [sdw6] sdw6 [sdw7] sdw7 [sdw1] sdw1 [sdw3] sdw3 [smdw] smdw [ mdw] mdw
gpssh basically runs an ssh command and has a defined login timeout value of 10 seconds, so it waits for 10 seconds for ssh to response, if it doesn't get a response then it terminates the connection and prints message as indicates above.
Here is a quick snippet of the python code, as you can see the gpssh initial task is to establish a ssh connection and then print "hello hello hello hello" to determine if it can receive a response.
[gpadmin@mdw ~]$ grep -A10 login_timeout /usr/local/greenplum-db/bin/lib/pxssh.py def loginAsync (self,server,username=None,login_timeout=10, port=None): cmd = 'ssh -o "BatchMode yes" -o "StrictHostKeyChecking no"' if port: cmd = cmd + ' -p %d' % port if username: cmd = cmd + ' -l %s' % username cmd = cmd + ' ' + server spawn.__init__(self, cmd, timeout=login_timeout) # we don't need this since we are not sending # password over (see comments in pexpect.py re: delaybeforesend) self.delaybeforesend = 0 ### cktan: wait for login def loginWait(self, login_timeout=10, set_term_dumb=False): #, "(?i)no route to host"]) echo = 'hello hello hello hello' self.sendline('echo ' + echo) exp = [echo, "(?i)permission denied", "(?i)terminal type", TIMEOUT, "(?i)connection closed by remote host", EOF] try: i = self.expect(exp) if i == 0: i = self.expect(exp) -- def login (self,server,username,login_timeout=10, port=22): self.loginAsync(server, username, login_timeout, port) return self.loginWait(login_timeout)
-- Retry the gpssh connection again.
-- Ensure if you are able to connect to that host using
ssh <hostname>
and there is no delay in the response or to return to the prompt, if there is a delay then you will need to check your DNS server.
-- Or, try to disable the DNS (as ssh try to resolve the hostname to IP) using
[root@mdw /tmp]# grep DNS /etc/ssh/sshd_config UseDNS no
and setting the client address on the servers /etc/hosts file.
-- Or, create a hostile with the IP address of the client, rather than hostname so that each ssh doesn't have to go through the name resolution process.