TKGm Management Cluster upgrade fails with "failed to apply providers upgrade: failed to migrate vSphereMachineTemplate" error
search cancel

TKGm Management Cluster upgrade fails with "failed to apply providers upgrade: failed to migrate vSphereMachineTemplate" error

book

Article ID: 376546

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x Tanzu Kubernetes Grid

Issue/Introduction

The TKGm Management Cluster (MC) upgrade fails with below error:

Migrating CRs, this operation may take a while... kind="VSphereMachineTemplate"
Error: failed to upgrade management cluster providers: failed to apply providers upgrade: failed to migrate <namespace>/<vSphereMachineTemplate-name>: action failed after 10 attempts: admission webhook "validation.vspheremachinetemplate.infrastructure.x-k8s.io" denied the request: VSphereMachineTemplate.infrastructure.cluster.x-k8s.io "<vSphereMachineTemplate-name>" is invalid: spec.template.spec: Invalid value
...
VSphereMachineTemplate spec.template.spec field is immutable. Please create a new resource instead.

Environment

Issue observed in a TKGm 2.5.1 -> 2.5.2 upgrade.

Cause

Undetermined. Issue was not reproducible in local Engineering environments.

Resolution

  • From the error code, identify the vSphereMachineTemplate that is causing the issue:

failed to migrate <namespace>/<vSphereMachineTemplate-name>

  • In the Management Context find all the existing versions for that vSphereMachineTemplate:
    # kubectl get vspheremachinetemplate -n <above-namespace> | grep <above-vSphereMachineTemplate-name>

    E.g.:
    # kubectl get vspheremachinetemplate -n default | grep <>-control-plane
    NAMESPACE              NAME                                                 AGE
    default                <>-control-plane                                           2y82d
    default                <>-control-plane-v1-28-7-vmware-1-<>       126d

  • Compare the YAML definition of the original <vSphereMachineTemplate-name> the error complains about and the latest vSphereMachineTemplate object with that name.
    # kubectl get vspheremachinetemplate -n <above-namespace> <vSphereMachineTemplate-name> -o yaml
    # kubectl get vspheremachinetemplate -n <above-namespace> <vSphereMachineTemplate-name>-<latest-version> -o yaml


    E.g.:
    In the example above we would need to compare the output of:
    # kubectl get vspheremachinetemplate -n default <>-control-plane -o yaml
    # kubectl get vspheremachinetemplate -n default <>-control-plane-v1-28-7-vmware-1-<> -o yaml

  • Identify any differences in .spec.template.spec fields.

    E.g.:
    Output for <>-control-plane object shows:

    spec:
      template:
        metadata: {}
        spec:
          cloneMode: fullClone
          datacenter: /<datacenter-name>
          datastore: /<datastore-path>
          diskGiB: 20
          folder: /<folder-path>
          memoryMiB: 4096
          network:
            devices:
            - dhcp4: true
              networkName: <network-name>
          numCPUs: 2
          resourcePool: /<resourcePool-path>
          server: <server-name>
          template: /<template-path>

    Output for <>-control-plane-v1-28-7-vmware-1-<> object shows:

    spec:
      template:
        metadata: {}
        spec:
          cloneMode: fullClone
          datacenter: /<datacenter-name>
          datastore: /<datastore-path>
          diskGiB: 20
          folder: /<folder-path>
          memoryMiB: 4096
          network:
            devices:
            - dhcp4: true
              networkName: <network-name>
          numCPUs: 2
          powerOffMode: hard
          resourcePool: /<resourcePool-path>
          server: <server-name>
          template: /<template-path>

    We can see that the original template the error is complaining about is missing .spec.template.spec.powerOffMode field.

  • Recreate the original vSphereMachineTemplate updating any differences with the latest vSphereMachineTemplate object with that name.
    # kubectl get vspheremachinetemplate -n <above-namespace> <vSphereMachineTemplate-name> -o yaml > <vSphereMachineTemplate-name>_bkp.yaml
    # cp <vSphereMachineTemplate-name>_bkp.yaml <vSphereMachineTemplate-name>_new.yaml
    # vim <vSphereMachineTemplate-name>_new.yaml
    # kubectl delete vspheremachinetemplate -n <above-namespace> <vSphereMachineTemplate-name>
    # kubectl apply -f <vSphereMachineTemplate-name>_new.yaml

    E.g.:
    # kubectl get vspheremachinetemplate -n default <>-control-plane -o yaml > <>-control-plane_bkp.yaml
    # cp <>-control-plane_bkp.yaml <>-control-plane_new.yaml
    # vim <>-control-plane_new.yaml

    Here we would include the .spec.template.spec.powerOffMode=hard missing field.
    We would also remove .metadata.creationTimestamp, .metadata.resourceVersion and .metadata.uid

    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: VSphereMachineTemplate
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {<>}
      creationTimestamp: "2022-06-13T13:10:15Z"
      generation: 1
      name: <>-control-plane
      namespace: default
      ownerReferences:
      - apiVersion: cluster.x-k8s.io/v1alpha3
        kind: Cluster
        name: <cluster-name>
        uid: <cluster-uid>
      resourceVersion: "<resourceVersion>"
      uid: <vspheremachinetemplate-uid>
    spec:
      template:
        metadata: {}
        spec:
          cloneMode: fullClone
          datacenter: /<datacenter-name>
          datastore: /<datastore-path>
          diskGiB: 20
          folder: /<folder-path>
          memoryMiB: 4096
          network:
            devices:
            - dhcp4: true
              networkName: <network-name>
          numCPUs: 2
          powerOffMode: hard
          resourcePool: /<resourcePool-path>
          server: <server-name>
          template: /<template-path>

    # kubectl delete vspheremachinetemplate -n default <>-control-plane
    # kubectl apply -f <>-control-plane_new.yaml

  • Retrigger the TKGm MC upgrade.