Management Cluster Rolling Update Stuck in Provisioning State due to template accessibility
search cancel

Management Cluster Rolling Update Stuck in Provisioning State due to template accessibility

book

Article ID: 436742

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • kubectl get machines -A shows new machines in a permanent Provisioning state after triggering a rolling update.
  • capv-controller-manager pods and capi-system pods have been restarted with no change in behavior.

  • Deleting stuck machines results in the creation of new machines that also fail to provision.

  • Describing the machine shows an error regarding template availability
    kubectl describe machine <stuck-machine-name> -n <namespace>

    Example Error:

    unable to find template by name '/Datacenter/vm/template-name-herev1.0...'

Environment

TCA 3.x

Cause

When a template is unavailable, the machine deployment will halt in a provisioning state awaiting availability of the template. This can occur if the template has been deleted, moved or renamed.

Resolution

  1. Log into vCenter and ensure the template exists at the exact inventory path specified in the error. If the template is in a different folder or has a different name (e.g., missing the unique hash suffix), the CAPV controller cannot find it.

  2. If the template exists but the path in the Management Cluster configuration is wrong, update the template object:
    kubectl edit vspheremachinetemplate <template-name> -n <namespace>
    Update the spec.template.spec.template field to match the actual vCenter inventory path.

  3. Once the infrastructure (template) is corrected, toggle the cluster state to trigger a fresh reconciliation loop:

    # Pause reconciliation
    kubectl patch cluster <cluster-name> --type merge -p '{"spec":{"paused":true}}'
    # Resume reconciliation
    kubectl patch cluster <cluster-name> --type merge -p '{"spec":{"paused":false}}'