Resolving ClusterBootstrapFailed: Add-on Reconciliation Failures during Cluster Lifecycle Operations
search cancel

Resolving ClusterBootstrapFailed: Add-on Reconciliation Failures during Cluster Lifecycle Operations

book

Article ID: 429609

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When provisioning or upgrading a Kubernetes workload cluster in a managed environment, the cluster remains in a False status. Specifically, the ClusterBootstrap conditions indicate that one or more add-on components (e.g., CNI, CSI, or Cloud Providers) have failed to reconcile.
  • kubectl get cluster shows the cluster is not in a Ready state.
  • kubectl describe clusterbootstrap <cluster-name> reveals messages such as:

ClusterBootstrap conditions <Addon-Name>-ReconcileFailed indicate reconcile has failed

  • Add-on DaemonSets or Deployments within the workload cluster show a discrepancy between Desired and Up-to-Date pod counts.

Environment

  • VMware vSphere Kubernetes Service

Cause

  • The ClusterBootstrapFailed error typically occurs when the management plane successfully provisions the virtual infrastructure, but the subsequent installation or update of essential services fails.
  • Common causes include:

OnDelete Update Strategy: The add-on DaemonSet is configured with an updateStrategy of OnDelete. This prevents Kubernetes from automatically replacing old pods with new configurations, causing the reconciliation process to wait indefinitely.

Image Pull Secrets/Connectivity: Workload nodes are unable to reach the container registry to pull the new version of the add-on image.

Resource Constraints: Nodes lack sufficient CPU or Memory to schedule the updated add-on pods.

Taints and Tolerations: New pods cannot be scheduled because the nodes have taints that the add-on pods do not tolerate (common during upgrades).

Resolution

  • Identify the Stuck Component:

    Log into the workload cluster and identify which system component is failing to update:

kubectl get daemonset -A

Look for components where the UP-TO-DATE value is less than the DESIRED value.

  • Check the Update Strategy:

    Check if the component is waiting for a manual trigger due to an OnDelete strategy:

kubectl get ds <component-name> -n <namespace> -o jsonpath='{.spec.updateStrategy.type}'

If the output is OnDelete, the pods will not update until the old ones are removed.

  • Trigger the Reconciliation:

    To resolve a stuck OnDelete reconciliation, use one of the following methods:

Method A: Manual Pod Deletion (Immediate)

Force the update by deleting the existing pods. The DaemonSet controller will immediately recreate them using the new configuration:

kubectl delete pods -n <namespace> -l <label-selector-for-component>

Method B: Patch to RollingUpdate (Recommended)

Change the strategy so that Kubernetes manages the rollout automatically in the future:

kubectl patch ds <component-name> -n <namespace> -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'

  • Monitor the rollout until UP-TO-DATE matches DESIRED:

kubectl rollout status ds/<component-name> -n <namespace>

  • Once all pods are updated and healthy, return to the management cluster and verify the bootstrap status:

kubectl get clusterbootstrap <cluster-name> -n <namespace>

The status should transition to Ready: True