Worker nodes fail to join cluster during upgrade due to KubeadmConfigTemplate field modification
search cancel

Worker nodes fail to join cluster during upgrade due to KubeadmConfigTemplate field modification

book

Article ID: 440734

calendar_today

Updated On:

Products

VMware Telco Cloud Automation VMware Telco Cloud Platform

Issue/Introduction

During a Kubernetes upgrade workflow, newly provisioned worker nodes fail to join the workload cluster despite receiving IP addresses.

Observed Behaviors:

  • kubectl get nodes from the workload cluster does not show the new nodes.
  • kubectl get machines from the management cluster shows machines in a Provisioning state.
  • New nodes are provisioned and obtain IPs, but the bootstrap process appears to stall.
  • Restarting capv-controller-manager or capi-controller-manager does not resolve the issue.

Environment

TCP 5.0

TCA 3.2

Cause

The issue is caused due to the modification of fields container-runtime or container-runtime-endpoint in the KubeadmConfigTemplate Custom Resource (CR). 
These fields are no longer supported for kubernetes version 1.27 and later, there presence will cause node initialization to fail.

Resolution

To resolve this, you must manually delete the conflicting keys from the live KubeadmConfigTemplate object to allow the automation to reconcile the configuration.

  1. Identify the affected Template: On the management cluster, find the template associated with the stuck node pool:
    kubectl get kubeadmconfigtemplate -n <namespace>
    
  2. Edit the Template:
    kubectl edit kubeadmconfigtemplate <template-name> -n <namespace>
  3. Remove Conflicting Keys: Navigate to spec.template.spec.joinConfiguration.nodeRegistration.kubeletExtraArgs and explicitly delete the following keys:
    • container-runtime
    • container-runtime-endpoint
  4. Save and Exit: Saving the changes will trigger a resource reconciliation loop. The cluster manager will reconcile the underlying Machine Deployments without configuration skew.

  5. Verify: Monitor the nodes in the workload cluster. The new infrastructure nodes should now pass bootstrap verification and join the cluster topology with a valid NODENAME