Resolving ClusterBootstrapFailed: Add-on Reconciliation Failures during Cluster Lifecycle Operations

search cancel

Resolving ClusterBootstrapFailed: Add-on Reconciliation Failures during Cluster Lifecycle Operations

book

Article ID: 429609

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When provisioning or upgrading a Kubernetes workload cluster in a managed environment, the cluster remains in a False status. Specifically, the ClusterBootstrap conditions indicate that one or more add-on components (e.g., CNI, CSI, or Cloud Providers) have failed to reconcile.
kubectl get cluster shows the cluster is not in a Ready state.
kubectl describe clusterbootstrap <cluster-name> reveals messages such as:

ClusterBootstrap conditions <Addon-Name>-ReconcileFailed indicate reconcile has failed

Add-on DaemonSets or Deployments within the workload cluster show a discrepancy between Desired and Up-to-Date pod counts.

Environment

VMware vSphere Kubernetes Service

Cause

The ClusterBootstrapFailed error typically occurs when the management plane successfully provisions the virtual infrastructure, but the subsequent installation or update of essential services fails.
Common causes include:

OnDelete Update Strategy: The add-on DaemonSet is configured with an updateStrategy of OnDelete. This prevents Kubernetes from automatically replacing old pods with new configurations, causing the reconciliation process to wait indefinitely.

Image Pull Secrets/Connectivity: Workload nodes are unable to reach the container registry to pull the new version of the add-on image.

Resource Constraints: Nodes lack sufficient CPU or Memory to schedule the updated add-on pods.

Taints and Tolerations: New pods cannot be scheduled because the nodes have taints that the add-on pods do not tolerate (common during upgrades).

Resolution

Identify the Stuck Component:

Log into the workload cluster and identify which system component is failing to update:

kubectl get daemonset -A

Look for components where the UP-TO-DATE value is less than the DESIRED value.

Check the Update Strategy:

Check if the component is waiting for a manual trigger due to an OnDelete strategy:

kubectl get ds <component-name> -n <namespace> -o jsonpath='{.spec.updateStrategy.type}'
If the output is OnDelete, the pods will not update until the old ones are removed.

Trigger the Reconciliation:

To resolve a stuck OnDelete reconciliation, use one of the following methods:

Method A: Manual Pod Deletion (Immediate)

Force the update by deleting the existing pods. The DaemonSet controller will immediately recreate them using the new configuration:

kubectl delete pods -n <namespace> -l <label-selector-for-component>

Method B: Patch to RollingUpdate (Recommended)

Change the strategy so that Kubernetes manages the rollout automatically in the future:

kubectl patch ds <component-name> -n <namespace> -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'

Monitor the rollout until UP-TO-DATE matches DESIRED:

kubectl rollout status ds/<component-name> -n <namespace>

Once all pods are updated and healthy, return to the management cluster and verify the bootstrap status:

kubectl get clusterbootstrap <cluster-name> -n <namespace>
The status should transition to Ready: True

Feedback

thumb_up Yes

thumb_down No