TKG Management cluster deletion fails, kube-apiserver cant connect to cert manager webhook service
search cancel

TKG Management cluster deletion fails, kube-apiserver cant connect to cert manager webhook service

book

Article ID: 327441

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
  • Management cluster deletion fails, waiting on cert-manager to be available
  • TKG installed in air-gapped environment with a Proxy for internet connectivity.
  • The bootstrap cluster is created and cert manager pods are running
  • Cluster-api controller pods are never created

The kube-apiserver throws the following error:

W0709 12:41:07.546518       1 dispatcher.go:182] Failed calling webhook, failing closed webhook.cert-manager.io: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s": Forbidden



Cause

Confirm that the Cert Manager Webhook pods and service are healthy on bootstrap cluster

kubectl config use-context <Bootstrap cluster kubeconfig>
kubectl get svc,pods,endpoints -n cert-manager


If they are all running but kube-apiserver can't reach the webhook service, then its likely that the connection is going through the Proxy.

Review no_proxy configuration on jump-box where tanzu cli command is run.

Resolution

Update no_proxy configuration to include "localhost" and ".svc" and delete management cluster.

The no_proxy on jump-box should include but not limited to:
  • localhost
  • .svc
  • IP or subnet of vCenter
  • IP or subnet of Internal Registries such as Harbor
  • Management cluster subnet
  • Workload cluster subnet
  • TKG Service subnet (SERVICE_CIDR)
  • TKG Cluster subnet (CLUSTER_CIDR)
There may be an old bootstrap cluster present, remove this first then proceed to delete the management cluster

docker ps docker rm -v <Bootstrap Cluster name> -f
tanzu management-cluster delete