Failed to create new bootstrap token: Internal error occurred: failed calling webhook rancher.cattle.io
search cancel

Failed to create new bootstrap token: Internal error occurred: failed calling webhook rancher.cattle.io

book

Article ID: 380221

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

Guest cluster nodes do not get created when using webhooks (such as rancher) in the guest cluster.

The webhooks watch for secret events, and if all the workers are down, the webhooks pods are inaccessible.
This causes the control plane rollout/creation to get stuck.

The Machine, wcpMachine objects are created, however the respective virtualMachine objects and Guest Cluster VMs in vCenter Server are not created. Additionally,

  • capw-controller manager log shows:

    bootstrap data is not available
  • kubeadmConfig for the node would miss datavalueScret field in the status field

  • capi-kubeadm-bootstrap-controller log shows:

    controller/kubeadmconfig "msg"="Reconciler error" "error"="failed to create new bootstrap token: Internal error occurred: failed calling webhook rancher.cattle.io: ..."

Environment

vSphere with Tanzu

Cause

capi-kubeadmconfig-controller is trying to create a token secret in the guest cluster before generating the datavalueSecret for cloud-init to join the node.

The rancher webhooks block the creation of the token secret as connections to the webhook pods fail.
As the worker nodes are down, no webhook pods are running.

Resolution

  1. Identify the webhook names to delete from the capi-kubeadm-bootstrap-controller log:
    Example:
    controller/kubeadmconfig "msg"="Reconciler error" "error"="failed to create new bootstrap token: Internal error occurred: failed calling webhook rancher.cattle.io:

  2. Backup the current validatingwebhookconfigurations and mutatingwebhookconfigurations that watch secret events
    Example:
    # kubectl get validatingwebhookconfigurations rancher.cattle.io -o yaml >  rancher-cattle-io.yaml

  3. Delete the validatingwebhookconfigurations and mutatingwebhookconfigurations
    Example:
    kubectl delete validatingwebhookconfigurations rancher.cattle.io

  4. After the webhook configurations are successfully deleted, the control plane nodes will be reconciled

  5. Once the cluster is back to the running state, recreate the validatingwebhookconfigurations and mutatingwebhookconfigurations from the backups that were taken in step 1.
    Example:
    # kubectl apply -f rancher-cattle-io.yaml