After upgrade from vSphere 7.X to vSphere 8.X, vSphere Kubernetes Cluster on Ubuntu TKR is failing due to being considered Photon TKR
search cancel

After upgrade from vSphere 7.X to vSphere 8.X, vSphere Kubernetes Cluster on Ubuntu TKR is failing due to being considered Photon TKR

book

Article ID: 386894

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

After upgrade from vSphere 7.X to vSphere 8.X, a vSphere Kubernetes Cluster using an Ubuntu TKR is failing in Ready False state.

The Supervisor cluster upgrade completed successfully.

 

While connected to the Supervisor cluster context, the following symptoms are observed:

  • The affected vSphere Kubernetes cluster's TKC is using an ubuntu TKR:
    • kubectl get tkc <affected cluster name> -n <affected cluster namespace>

      NAMESPACE NAME CONTROL PLANE WORKER TKR NAME
      my-cluster-ns my-ubuntu-cluster # # vX.XX.XX---vmware-X-fips-X.tkg.X.ubuntu

  • The affected vSphere Kubernetes cluster's machines are attempting to deploy on a photon TKR (the TKR does not reference ubuntu):
    • kubectl get machine -n <affected cluster namespace>

      NAMESPACE NAME CLUSTER VERSION
      my-cluster-ns my-cluster-cp-a1 my-ubuntu-cluster vX.XX.XX--vmware.X-fips.X

  • The affected vSphere Kubernetes cluster's TKC does not have the following ubuntu TKR annotation:
    • kubectl describe tkc <affected cluster name> -n <affected cluster namespace>

      annotations:
      run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu

  • The Supervisor cluster upgrade completed successfully, showing all steps as completed in the below output:
    • /usr/lib/vmware-wcp/upgrade/upgrade-ctl.py get-status | jq '.progress | to_entries | .[] | "\(.value.status) - \(.key)"' | sort

Environment

vSphere 8.0 with Tanzu

This issue can occur regardless of whether or not this cluster is managed by TMC.

Cause

The following annotation is needed for vSphere Kubernetes Clusters running on a TKR using the ubuntu OS:

run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu

By default and without this annotation, nodes are deployed on the Photon OS.

It is expected for this annotation to be automatically patched by the Tanzu Kubernetes Cluster controllers during the CAPW to CAPV object migration starting in vSphere 8.x and higher. However in this scenario, the patching has failed to add this annotation.

This can occur if Kubernetes services were not healthy during the above noted CAPW to CAPV object migration.

This may also occur when the affected vSphere Kubernetes cluster TKC upgrade is initiated prior to the completion of CAPW to CAPV object migration during the Supervisor cluster upgrade after the vSphere 8.x or higher upgrade.

Resolution

The ubuntu OS annotation needs to be re-applied to the affected vSphere Kubernetes Cluster TKC object.

  1. Connect into the Supervisor cluster context

  2. Edit the affected cluster's TKC object to add the following ubuntu annotation under annotations:
  3. Check that the ubuntu annotation was added correctly to the affected cluster's TKC object annotations as per the above documentation:
    • kubectl describe tkc <affected cluster name> -n <affected cluster namespace>

      run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu

  4. Confirm that the affected cluster's cluster object has an appropriate tkr annotation with the expected TKR version:
    • kubectl describe cluster <affected cluster name> -n <affected cluster namespace>

      annotations:
        
      run.tanzu.vmware.com/tkr: <EXPECTED TKR VERSION>

  5. Confirm that the affected cluster's node objects are deploying successfully and stabilizing into a healthy Running state:
    • kubectl get machine,vm,vspheremachine -n <affected cluster namespace>