Unable to Upgrade vSphere Supervisor Workload Cluster - No Updates Available/Updates Available is Blank or Desired TKR is not Available due to Missing OsImage
search cancel

Unable to Upgrade vSphere Supervisor Workload Cluster - No Updates Available/Updates Available is Blank or Desired TKR is not Available due to Missing OsImage

book

Article ID: 396745

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service VMware Tanzu Mission Control VMware Tanzu Mission Control - SM

Issue/Introduction

When looking to upgrade a workload cluster in a vSphere Supervisor environment, the desired TKR to upgrade to is not available or not listed under Updates Available.

 

While connected to the Supervisor cluster context, the following symptoms are observed:

  • For a non-classy cluster, checking the TKC object shows Updates Available as empty/blank or not listing the desired TKR to upgrade to:
    • kubectl get tkc -n <cluster namespace>
    • NAMESPACE           NAME           CONTROL PLANE  WORKER   TKR NAME         AGE    READY    TKR COMPATIBLE    UPDATES AVAILABLE
      <cluster namespace> <cluster name a> # # <TKR.VERSION.A> ###d True True [<TKR.VERSION.B> <TKR.VERSION.D>]
      <cluster namespace> <cluster name b> # # <TKR.VERSION.B> ###d True True
    • The above values will vary by environment.

  • When describing the cluster object, the list of TKRs to upgrade to are not present or do not contain the desired TKR to upgrade to:
    • kubectl describe cluster -n <cluster namespace> <cluster name>
    • Type: TopologyReconciled
      Last Transition Time: YYYY-MM-DDTHH-MM-SSZ
      Message: [<TKR.VERSION.B> <TKR.VERSION.D>]

  • The desired TKR to upgrade to is present in the TKR list, showing as compatible True:
    • kubectl get tkr | grep <tkr version>
    • <TKR---VERSION.C>    <TKR+VERSION.C>   True   True   ###d

  • There are no system pods that are failing in the Supervisor cluster:
    • kubectl get pods -A | grep -v Run

       

  • When attempting to upgrade the cluster through kubectl CLI, an error message similar to the following is returned, where the values in brackets <> will vary by environment:
    • In TKG service 3.2 and higher, the error message will refer to KR instead of TKR.
    • error: clusters.cluster.x-k8s.io "<cluster name>" could not be patched: admission webhook 'tkr-resolver-cluster-webhook.tanzu.vmware.com" denied the request: could not resolve TKR/OSImage for controlPlane, machineDeployments: [<nodepool-name>], query: [controlPlane: {k8sVersionPrefix: '<TKR.VERSION>, tkrSelector: "tkr.tanzu.vmware.com/standard', osImageSelector: 'os-arch-amd64,os-name=<operating system>,os-version=<operating system version>,tkr.tanzu.vmware.com/standard'}, machineDeployments: [{k8sVersionPrefix: '<TKR.VERSION>', tkrSelector: 'tkr.tanzu.vmware.com/standard', osImageSelector: 'os-arch=amd64,os-name=<operating system>,os-version=<operating system version>'}]}, result: {controlPlane: {k8sVersion: '', tkrName: '', osImagesByTKR: map[]}, machineDeployments: [{k8sVersion: '', tkrName: '', osImagesByTKR: map[]}]}
    • This issue and error message can also occur when creating a workload cluster on the missing TKR.

 

In the vSphere web UI, the following checks regarding the content library and namespaces match:

  • The content library associated with the workload cluster's namespace contains the desired TKR with desired OS to upgrade to.
  • This content library is not encountering the following thumbprint issue: SSL certificate for host wp-content.vmware.com

 

In Tanzu Mission Control (TMC) web console, the affected workload cluster would not be able to be upgraded or the desired TKR version would not be available to be selected.

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

This issue can occur regardless of whether or not the cluster is managed or upgraded by Tanzu Mission Control (TMC)

Cause

The available updates for a workload cluster is populated based on the N+1 version for the next TKR, the OS version of the workload cluster, the compatibility of the next TKR with the environment and the availability of the image in the Supervisor cluster.

A workload cluster cannot skip TKR versions and must be upgraded sequentially. For example, v1.27 to v1.28 then v1.28 to v1.29.

Once a workload cluster is upgraded to a TKR for vSphere 8.0 (non-legacy), it cannot be upgraded to a TKR for vSphere 7.0 (legacy) regardless of the version.

The operating system of a workload cluster currently cannot be changed. A new workload cluster on the desired OS must be created.

TKRs for vSphere 7.0 (legacy) are listed multiple times for the same version, appended with .ubuntu for the ubuntu operating system.

TKRs for vSphere 8.0 (non-legacy) are not appended with the operating system and an annotation is used to determine the operating system.

Resolution

Initial checks should be made to confirm that the desired TKR matches the following requirements:

  • The desired TKR is within one version higher than the current cluster's version.
    • Workload cluster upgrades must be performed sequentially.

  • The desired TKR is compatible with the environment.
    • While connected to the Supervisor cluster context, the following command can be run to check compatibility:
      • kubectl get tkr | grep <tkr version>
    • The interoperability matrix can be checked for compatibility between the vCenter and TKR.
    • Note: Once the TKG service installed in a Supervisor cluster, TKR compatibility is determined based on TKR compatibility with the TKG service.

  • The desired TKR is for the same OS version as the existing workload cluster.
    • TKRs for vSphere 7 (legacy) will be appended with .ubuntu for an ubuntu TKR.
    • TKRs for vSphere 8 (non-legacy) do not have the operating system appended. 
      • The operating system is considered photon by default.
      • An annotation for ubuntu is needed to create an ubuntu cluster.

  • The desired upgrade would not be from a vSphere 8 (non-legacy) TKR to vSphere 7 (legacy) TKR.
    • A workload cluster running on a TKR for vSphere 8 cannot be upgraded to a TKR for vSphere 7 regardless of versions.
    • The Tanzu Kubernetes Releases (TKR) release notes can be checked to see which TKR version is for which vSphere version.
    • Alternatively, the labels on a TKR can be checked to see whether or not the TKR is legacy.

  • The workload cluster to be upgraded has an annotation (specifying the operating system type and operating system version) that is compatible with the desired TKR version.

 

If the above checks return that the desired TKR is a valid TKR to upgrade to, the following checks can be made to ensure that its image was properly created in the Supervisor cluster:

  1. Connect into the Supervisor cluster context

  2. Confirm on the type of operating system used by the workload cluster in question:
    • kubectl describe cluster -n <cluster namespace> <cluster name>
    • For vSphere 8 TKRs, an annotation is needed to specify that the cluster is running ubuntu.
      • Without this annotation, the cluster defaults to photon operating system.

  3. Determine that the corresponding image objects were successfully created for the desired TKR:
    • In vSphere 7:
      • kubectl get osimage,virtualmachineimage | grep <tkr version>
    • In vSphere 8:
      • kubectl get osimage,cvmi | grep <tkr version>
  4.  Confirm that the osimage matches the operating system and operating system version desired by the workload cluster upgrade:
    • kubectl get osimage | grep <tkr version>
       
    • NAME        K8S VERSION     OS NAME    OS VERSION      ARCH      TYPE      COMPATIBLE       CREATED
      <vmi-id-a> <tkr version a> photon #.# amd64 vmi ###d
      <vmi-id-b> <tkr version b> ubuntu ##.## amd64 vmi ###d

    • It is expected for each TKR to support both photon and ubuntu OS. For each TKR, there should be one osimage for photon and one osimage for ubuntu.

  5. If the desired osimage with the desired operating system is missing for the desired TKR version:
    • This is a known issue which was resolved in TKG service 3.1 and higher.

      Missing OsImage with Desired Operating System Workaround Steps:
      1. Connect into the Supervisor cluster context

      2. Locate the desired TKR:
        • kubectl get tkr | grep <tkr version>
      3. Delete the desired TKR:
        • kubectl delete tkr <tkr version>
      4. Confirm that the desired TKR recreated after a few minutes:
        • kubectl get tkr | grep <tkr version>
      5. Check that the osimage for the TKR version is created for each operating system a few minutes after the TKR was recreated:
        • kubectl get osimage | grep <tkr version>
      6. If the same error persists, please reach out to VMware by Broadcom Technical Support referencing this KB article.

  6. If the above commands do not output the expected virtualmachineimage/clustervirtualmachineimage (cvmi):
    • There is an issue with the system generating the necessary image objects in the Supervisor cluster.
    • Please reach out to VMware by Broadcom Technical Support referencing this KB article.