There is a new AWS cluster. "SSH" key has been copied to all the hosts and it works fine with any host. gpssh works fine with a single host. But, gpssh to multiple hosts returns the following error for all the hosts that are involved:
# gpssh -f hostfile Note: command history unsupported on this machine ... [ERROR] unable to login to sdw2 hint: use gpssh-exkeys to setup public-key authentication between hosts [ERROR] unable to login to mdw hint: use gpssh-exkeys to setup public-key authentication between hosts [ERROR] unable to login to sdw1 hint: use gpssh-exkeys to setup public-key authentication between hosts [ERROR] unable to login to sdw3 hint: use gpssh-exkeys to setup public-key authentication between hosts [ERROR] unable to login to sdw4 hint: use gpssh-exkeys to setup public-key authentication between hosts [ERROR] unable to login to smdw hint: use gpssh-exkeys to setup public-key authentication between hosts
The customer chose RHEL 6.9 instance with 1GB memory for each AWS instance (host). Also, the setting in /etc/sysctl.conf is, "vm.overcommit_memory = 2"
When running trace for the gpssh process, we can see an error code of ENONMEM - 'Cannot allocate memory'.
Set vm.overcommit_memory to 1 at each host or choose an AWS instance with more memory. It will let Linux allocate enough memory in this situation. The suggested memory size from the document is 16GB for the production system.