Tanzu Kubernetes Grid 1.2.1 deployment is stuck due to cert-manager pod creation issue
search cancel

Tanzu Kubernetes Grid 1.2.1 deployment is stuck due to cert-manager pod creation issue

book

Article ID: 317044

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid 1.x

Issue/Introduction

Symptoms:
  • Tanzu Kubernetes Grid (TKG) 1.2.1 deployment is stuck at 4/8 step
  • The TKG deployment is taking place in an air-gapped environment
  • You see messages similar to the following in the tkg init output: 

I0120 12:17:46.445757 cert_manager.go:453] Waiting for cert-manager to be available...
I0120 12:17:46.456910 cert_manager.go:419] Updating Namespace="cert-manager-test"
I0120 12:17:46.603389 cert_manager.go:411] Creating Issuer="test-selfsigned" Namespace="cert-manager-test"
.
.

I0120 12:47:48.637468 client.go:150] Deleting kind cluster: tkg-kind-c03t3ku440qoiqq7imrg
E0120 12:47:52.174348 common.go:40]
Error: : unable to set up management cluster: unable to initialize providers: timed out waiting for the condition, this can be possible because of the outbound connectivity issue. Please check deployed nodes for outbound connectivity.
E0120 12:47:52.174648 common.go:44]
Detailed log about the failure can be found at: /tmp/tkg-20210120T121642480345317.log

  • You see that the cert-manager pod creation is failing when you run kubectl get all -A on the kind cluster
  • You see imagepullbackoff errors when you run kubectl describe pod against the cert-manager pod
  • You see events similar to the following when you run kubectl describe pod against the cert-manager pod:

Jan 25 07:22:15 tkg-kind-c0772pe440qsvjjovfbg-control-plane kubelet[733]: E0125 07:22:15.698512     733 remote_image.go:113] PullImage "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1": failed to resolve reference "registry.domain.local/newapp/cert-manager/cert-manager-controller:v0.16.1_vmware.1": get TLSConfig for registry "https://registry.domain.local": failed to load CA file: open /etc/containerd/tkg-registry-ca.crt: no such file or directory

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
 


Environment

VMware Tanzu Kubernetes Grid 1.x
VMware Tanzu Kubernetes Grid Plus 1.x

Resolution

This is a known issue affecting Tanzu Kubernetes Grid 1.2.1. There is currently no resolution.
 


Workaround:
To workaround the issue, set the below in the ~/.tkg/config.yaml file and re-run the tkg init command:

TKG_CUSTOM_IMAGE_REPOSITORY: <your-harbor-fqdn>/library
TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: true
TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: Cg==


or

TKG_CUSTOM_IMAGE_REPOSITORY: <your-harbor-fqdn>/library
TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: false
TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: <base64 encoded harbor-ca-cert>