Customer has the following setup with multiple foundations:
The bbr-backup-director task running in foundation-B uses the CLI tool, bbr, to backup foundation-A. In order to connect with foundation-A, BOSH director which is running in a private CIDR is not directly reachable from foundation-B, the bbr task takes use of the tunneling as described in BOSH doc.
To be specific, it sets up an SSH tunnel (stored in $BOSH_ALL_PROXY) to the BOSH director and allows toggling $NO_PROXY for connecting with either a local or remote BOSH director. The expected behavior is as follows.
$NO_PROXY is set, including the BOSH director IP, the bbr task intends to connect with it directly, bypassing $BOSH_ALL_PROXY;$NO_PROXY is not set, the bbr task should use the tunnel in $BOSH_ALL_PROXY, assuming the targeted director is a remote director not locally reachable.At the time of writing, (bbr v1.9.6), the bbr CLI does not respect $NO_PROXY but respects only $no_proxy. It would not utilize the tunnel in $BOSH_ALL_PROXY if $no_proxy includes BOSH director IP/CIDR.
Below is an example case hitting the problem:
172.###.###.0/24 and occupies identical private IP 172.###.###.14.$http_proxy, $https_proxy and $no_proxy are set with $https_proxy containing the enterprise proxy of the customer. The BOSH director private CIDR 172.###.###.0/24 is included in $no_proxy because from the perspective of foundation-B, there is no need to use $https_proxy to reach the local BOSH director in foundation-B.The setup is illustrated in the following diagram:
With this setup, the bbr-backup-director task running in foundation-B failed to backup foundation-A BOSH director with error as follows:
bbr] 2021/03/22 08:10:07 INFO - Looking for scripts 1 error occurred: error 1: finding scripts failed on bosh/0: ssh.Run failed: ssh.Stream failed: ssh.Dial failed: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
The underlying cause is within bosh-utils, the library used by the bbr tool for connecting with BOSH director. The bosh-utils does not respect $NO_PROXY but respects only $no_proxy. In the above example, it connects with the local foundation-B BOSH director (because $no_proxy is set) using the ssh key meant for the foundation-A BOSH director. Hence, it hits the ssh authenticate failure. Please refer to this bosh-utils code.
As a workaround, the customer can unset no_proxy or exclude the BOSH director IP/CIDR from no_proxy by modifying the export-director-metadata script. For example:
# https://github.com/pivotal-cf/bbr-pcf-pipeline-tasks/blob/master/scripts/export-director-metadata#L54-L58
# Set NO_PROXY for BOSH Director
if [ ! -z ${SET_NO_PROXY:+x} ] && [ $SET_NO_PROXY = true ]; then
export NO_PROXY="${BOSH_ENVIRONMENT},${NO_PROXY:=${no_proxy:=}}"
echo "exporting NO_PROXY=${NO_PROXY}"
fi
# unset no_proxy or export no_proxy with new value to exclude BOSH director
export no_proxy=<...>
Troubleshooting tips:
ssh -4 -D 5000 -NC "ubuntu@<REMOTE_OPSMAN_FQDN>" -i <REMOTE_OPSMAN_SSH_KEY> -o ServerAliveInterval=60 -o StrictHostKeyChecking=no &
ssh -o ProxyCommand='nc -X 5 -x localhost:5000 %h %p' -i <BBR_USR_SSH_KEY> bbr@<REMOTE_BOSH_DIRECTOR_IP>
export BOSH_ALL_PROXY=socks5://localhost:5000 echo $no_proxy unset no_proxy bbr director --host <REMOTE_BOSH_DIRECTOR_IP> --username bbr --private-key-path ./<BBR_USR_SSH_KEY> pre-backup-check
tail -n 0 -f /var/log/auth.log | grep "Accepted publickey for bbr from"