Error: "MachineSet and ControlPlane versions do not conform to kubeadm version skew policy"
search cancel

Error: "MachineSet and ControlPlane versions do not conform to kubeadm version skew policy"

book

Article ID: 442419

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • During a cluster upgrade, scale-up, or remediation of a failed worker node, the operation stalls indefinitely. New worker nodes are not provisioned in the infrastructure provider, and the MachineDeployment or MachineSet shows unavailable replicas.

  • When checking the capi-controller-manager logs or describing the Machine object, the following error is observed:
    Performing "Scale up" on hold because MachineSet version (<version>) and ControlPlane version (<version>) do not conform to kubeadm version skew policy as kubeadm only supports joining with the same major+minor version as the control plane ("KubeadmVersionSkew" preflight check failed)

Environment

  • 1.28

Cause

  • This is a safety mechanism enforced by the Cluster API MachineSet controller.

  • By default, the kubeadm utility strictly requires that joining nodes match the exact major and minor version of the Control Plane. If an upgrade is interrupted, or if a cluster attempts to replace a deleted older worker node (e.g., v1.30.x) while the Control Plane has already been upgraded to a newer version (e.g., v1.31.x), CAPI detects this N-1 version skew.

  • To prevent provisioning a VM that it assumes will fail to join the cluster, CAPI places the scale-up operation "on hold" and blocks the creation of the underlying infrastructure Machine.

Resolution

To break the deadlock, you must temporarily instruct the Cluster API controllers to bypass the KubeadmVersionSkew preflight check. This allows the node to provision, satisfy the replica requirements, and unblock the queue so the upgrade process can resume.

  1. Identify the stalled MachineDeployment:

    List the MachineDeployments in the affected workload cluster namespace to find the one with unavailable replicas:

    kubectl get machinedeployment -n <workload-namespace>


  2. Apply the bypass annotation:

    Annotate the stalled MachineDeployment(s) to explicitly skip the version skew preflight check. CAPI will automatically propagate this annotation down to the underlying MachineSets.

    kubectl annotate machinedeployment <machinedeployment-name> -n <workload-namespace> machineset.cluster.x-k8s.io/skip-preflight-checks="KubeadmVersionSkew"

    Note: If you have multiple MachineDeployments, apply the annotation to all of them.


  3. Safely Resume the CLI Upgrade:

    Because the initial upgrade abandoned the worker nodes, you must run the upgrade command again.

    WARNING: Do not run a generic tanzu cluster upgrade command without specifying a version. If the management cluster's default version has moved to N-2 (e.g., 1.32.x), the CLI will attempt to skip a minor version, permanently breaking the worker nodes.

    1. Find the exact Tanzu Kubernetes Release (TKr) that matches your current Control Plane version:

      tanzu kubernetes-release get


    2. Re-run the upgrade, locking it to that specific TKr:

      tanzu cluster upgrade <cluster-name> -n <workload-namespace> --tkr <exact-tkr-name> -v 9

      Note: The CLI will detect that the Control Plane is already at this version, skip it, and proceed directly to upgrading the MachineDeployments.


  4. Verify infrastructure provisioning

    Watch the Machine objects to confirm that the preflight check is bypassed and CAPI has resumed provisioning the infrastructure:

    kubectl get machines -n <workload-namespace> -w

    Note: You should see new Machines transition from Provisioning to Provisioned, and eventually Running.


  5. Post-Resolution Cleanup:

    Note: Leaving preflight checks disabled permanently is not recommended, as it bypasses critical cluster safety rails. Once the nodes have successfully joined the cluster and the version skew is resolved (i.e., the worker nodes successfully upgrade to match the Control Plane), remove the annotation.

    Run the following command, ensuring you include the trailing minus (-) sign at the end of the annotation key to delete it:

    kubectl annotate machinedeployment <machinedeployment-name> -n <workload-namespace> machineset.cluster.x-k8s.io/skip-preflight-checks-