failed
" on Action "UPGRADE
"1 errored
" state, specifically for the "run errand apply-addons
"error: deployment \"coredns"\ exceeded its progress deadline
"Private IP DNS name
" in the AWS instance./var/vcap/sys/log/aws-cloud-controller-manager.stderr.log
) will return errors like: "failed to get provider ID for node
" and "failed to get instance ID from cloud provider: instance not found
"DHCP Option Set
" that references a custom "Domain Name
"
TKGI clusters starting with 1.18 on AWS
TKGM environments on AWS.
--cloud-provider
" variable is added to kubelet and kube-proxy configurations, the "--hostname-override
" variable will be ignored, this is detailed in the following upstream kubernetes docs.--cloud-provider=aws
" be applied if the Cloud Provider is AWS. If this variable is set, and the environment uses custom domain names (configured in the AWS VPC DHCP Option Set), an additional variable must be added to the kubelet and kube-proxy config for "--hostname-override
" to match the regional domain name assigned to the AWS Instance VM.--hostname-override
" value doesn't match the regional domain name in the AWS Instance VM, the node might not join the cluster, or if if joins the cluster, it might never have the "node.cloudprovider.kuberenetes.io/uninitialized=true:NoSchedule
" taint removed, preventing pod scheduling on the updated nodes.
At a high level, this workaround uses Bosh's os-conf release "runtime config
" to apply node level file modifications in a pre-start script. The node level change is a modification of the "--hostname-override
" variable in the kubelet_ctl and kube-proxy_ctl job scripts and a restart of the kubelet and kube-proxy monit processes. The "--hostname-override
" variable is replaced with the eth0 IP address reverse lookup value, which provides the AWS Instance configured DNS value. The os-conf "runtime
" configuration presented below is applied to all [worker] objects in ALL Bosh deployments not explicitly excluded in the "exclude.deployments
" section on the configuration.
bosh upload-release --sha1 daf34e35f1ac678ba05db3496c4226064b99b3e4 "https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=22.2.1"
bosh releases | grep os-conf
os-conf 22.2.1 a2154d6
cat <<'EOFA' > runtime.yml
releases:
- name: "os-conf"
version: "22.2.1"
addons:
- name: aws-dhcp-configuration
exclude:
deployments: [pivotal-container-service-<ID>,harbor-container-registry-<ID>]
include:
instance_groups: [worker]
jobs:
- name: pre-start-script
release: os-conf
properties:
script: |-
#!/bin/bash
sed -i "s_curl http://169.254.169.254/latest/meta-data/hostname_host \`ip a s eth0 \| grep inet \| awk -F\"inet \" '{print \$2}' \| awk -F\"/\" '{print \$1}'\` \| awk '{print \$5}' \| sed 's=.$=='_" /var/vcap/jobs/kubelet/bin/kubelet_ctl
cat /var/vcap/jobs/kubelet/bin/kubelet_ctl | grep eth0
sed -i "s_curl http://169.254.169.254/latest/meta-data/hostname_host \`ip a s eth0 \| grep inet \| awk -F\"inet \" '{print \$2}' \| awk -F\"/\" '{print \$1}'\` \| awk '{print \$5}' \| sed 's=.$=='_" /var/vcap/jobs/kube-proxy/bin/kube_proxy_ctl
cat /var/vcap/jobs/kubelet/bin/kubelet_ctl | grep eth0
monit restart kube-proxy
monit restart kubelet
echo "done"
EOFA
bosh update-runtime-config runtime.yml
bosh runtime-config
tkgi upgrade-cluster <CLUSTER_NAME>