Symptoms:
- Tanzu Kubernetes Grid (TKG) 1.2.1 deployment is stuck at 4/8 step
- You are deploying TKG in an air-gapped environment
- When you run kubectl -n cert-manager get po you see that all pods are in a pending state
- When you run kubectl -n cert-manager describe po <pod name> you see messages similar to the following:
0/X nodes are available: X node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate
- You see messages similar to the following in the output from the tkg init command:
I0120 12:17:46.445757 cert_manager.go:453] Waiting for cert-manager to be available...
I0120 12:17:46.456910 cert_manager.go:419] Updating Namespace="cert-manager-test"
I0120 12:17:46.603389 cert_manager.go:411] Creating Issuer="test-selfsigned" Namespace="cert-manager-test"
- When you log on to one of the control plane nodes and check the states of the nodes in the cluster (kubectl --kubeconfig /etc/kubernetes/admin.conf get no -o wide), you see that there are no worker nodes joined yet.
- When you log on to one of the worker nodes, you see messages similar to the following in journalctl output, indicating an issue with name resolution.
Feb 09 07:22:15 tkg-kind-c0772pe440qsvjjovfbg-control-plane kubelet[733]: E0125 07:22:15.698512 733 remote_image.go:113] PullImage "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1": failed to resolve reference
- When you log on to one of the worker nodes and check contents the /etc/resolv.conf file you don't see the proper nameserver entries. When you check the contents of the same file on a control plane node, it has the proper nameserver entries.
- You have used a ytt overlay file at ~/.tkg/providers/infrastructure-vsphere/ytt/namesever.yml to specify the nameserver entries in the /etc/resolv.conf file.