Existing TKG 2.1.1 MultiOS workload clusters can't be operated after MC is upgraded to TKG 2.2.0 due to TKr issues
search cancel

Existing TKG 2.1.1 MultiOS workload clusters can't be operated after MC is upgraded to TKG 2.2.0 due to TKr issues

book

Article ID: 319416

calendar_today

Updated On:

Products

VMware VMware Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
  • This issue will only impact users using multiple osimages in a single workload cluster.
  • The issue will be met on multiOS workload clusters after mgmt cluster upgraded to versions prior to 2.2.1
  • For multiOS users, when they operate (i.e. scale, upgrade, etc.)  a classy workload cluster after mgmt cluster upgraded to a version prior to 2.2.1, they will see errors like:
 

> tanzu cluster scale gojeta -w 4

Error: error while creating object for "&TypeMeta{Kind:,APIVersion:,}" default/gojeta: admission webhook "tkr-resolver-cluster-webhook.tanzu.vmware.com" denied the request: could not resolve TKR/OSImage for machineDeployments: [md-0], query: {controlPlane: nil, machineDeployments: [{k8sVersionPrefix: 'v1.24.10+vmware.1-tkg.2', tkrSelector: '', osImageSelector: 'image-type=ova,os-name=windows'} nil]}, result: {controlPlane: nil, machineDeployments: [{k8sVersion: '', tkrName: '', osImagesByTKR: map[]} nil]}



Environment

VMware Tanzu Kubernetes Grid 2.1.0
Tanzu Kubernetes Grid 1.5.2
Tanzu Kubernetes Grid 1.6.0

Cause


TKR resolver should skip resolving MD if it is already resolved. But it always looks into TKR_DATA from the cluster var,  multiOS cluster MDs have different TKR_DATA from the one in cluster var. Thus TKR resolver will always try to resolve the MD. After the MGMT cluster is upgraded, the TKr compatibility will be changed, the resolve for MD will fail when mutating WL clusters.

Resolution


The fix is already in 2.3, will backport to 2.2.1

Workaround:
1. The user can add a config map with the TKr version that the WL clusters use.
 

apiVersion: v1
data:
  tkrVersions: |
    - <tkr version>
kind: ConfigMap
metadata:
  labels:
    run.tanzu.vmware.com/additional-compatible-tkrs: ""
  name: tkg-additional-compatibility-versions
  namespace: tkg-system

 

2. After the TKr is reconciled, WL clusters can be operated normally.
 

i.e.: TKR changes FROM:

NAME                        VERSION                   READY   COMPATIBLE   CREATED
v1.24.10---vmware.1-tkg.2   v1.24.10+vmware.1-tkg.2   False   False        17d


TO:

NAME                        VERSION                   READY   COMPATIBLE   CREATED
v1.24.10---vmware.1-tkg.2   v1.24.10+vmware.1-tkg.2   True    True         17d




Cluster status will return to normal (TopologyReconciled will be True)

Once clusters return to normal, users can scale/update/delete the MultiOS clusters without errors. Upgrading MultiOS clusters will not be possible yet due to other limitations. Resolution for these limitations will be provided in a separate KB and updated here