Tanzu Kubernetes Grid management cluster creation fails with cert-manager pods stuck in a pending state
search cancel

Tanzu Kubernetes Grid management cluster creation fails with cert-manager pods stuck in a pending state

book

Article ID: 317455

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
  • Tanzu Kubernetes Grid (TKG) 1.2.1 deployment is stuck at 4/8 step
  • You are deploying TKG in an air-gapped environment
  • When you run kubectl -n cert-manager get po you see that all pods are in a pending state
  • When you run kubectl -n cert-manager describe po <pod name> you see messages similar to the following:
0/X nodes are available: X node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate
  • You see messages similar to the following in the output from the tkg init command:

I0120 12:17:46.445757 cert_manager.go:453] Waiting for cert-manager to be available...
I0120 12:17:46.456910 cert_manager.go:419] Updating Namespace="cert-manager-test"
I0120 12:17:46.603389 cert_manager.go:411] Creating Issuer="test-selfsigned" Namespace="cert-manager-test" 

  • When you log on to one of the control plane nodes and check the states of the nodes in the cluster (kubectl --kubeconfig /etc/kubernetes/admin.conf get no -o wide), you see that there are no worker nodes joined yet.
  • When you log on to one of the worker nodes, you see messages similar to the following in journalctl output, indicating an issue with name resolution.
Feb 09 07:22:15 tkg-kind-c0772pe440qsvjjovfbg-control-plane kubelet[733]: E0125 07:22:15.698512     733 remote_image.go:113] PullImage "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1": failed to resolve reference
  • When you log on to one of the worker nodes​​​ and check contents the /etc/resolv.conf file you don't see the proper nameserver entries. When you check the contents of the same file on a control plane node, it has the proper nameserver entries.
  • You have used a ytt overlay file at ~/.tkg/providers/infrastructure-vsphere/ytt/namesever.yml to specify the nameserver entries in the /etc/resolv.conf file.


Environment

VMware Tanzu Kubernetes Grid Plus 1.x
VMware Tanzu Kubernetes Grid 1.x

Resolution

You can use a ytt overlay file similar to the following to ensure that the proper nameserver entries are added to the /etc/resolv.conf file on all worker nodes:

~/.tkg/providers/infrastructure-vsphere/ytt/namesever.yml

#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
spec:
  kubeadmConfigSpec:
    preKubeadmCommands:
    #! Add nameserver to all k8s nodes
    #@overlay/append
    - echo 'nameserver 192.168.63.10' >> /usr/lib/systemd/resolv.conf
    #! Add domain to all k8s nodes
    #@overlay/append
    - echo 'domain billdesk.local' >> /usr/lib/systemd/resolv.conf
    #! Remove the resolv.conf from all the K8s nodes
    #@overlay/append
    - rm /etc/resolv.conf
    #! Create a new link file
    #@overlay/append
    - ln -s /usr/lib/systemd/resolv.conf /etc/resolv.conf
#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"})
---
spec:
 template:
  spec:
    preKubeadmCommands:
    #! Add nameserver to all k8s nodes
    #@overlay/append
    - echo 'nameserver 192.168.63.10' >> /usr/lib/systemd/resolv.conf
    #! Add domain to all k8s nodes
    #@overlay/append
    - echo 'domain billdesk.local' >> /usr/lib/systemd/resolv.conf