vSphere with Tanzu TKC upgrade from TKG1.0 to TKG2.0 TKr fails due to KAPP controller deployment
search cancel

vSphere with Tanzu TKC upgrade from TKG1.0 to TKG2.0 TKr fails due to KAPP controller deployment

book

Article ID: 319382

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

  • After an attempt to upgrade a vSphere with Tanzu TKC from 1.22.9 to 1.23.15 (or from any TKG1.0 to 2.0 TKr's), the upgrade will be hung.
  • The TKC will deploy 1 new ControlPlane node, however, the new node will not be joined to the cluster.
  • Running kubectl get kcp -n <TKC_NAMESPACE> from SupervisorCluster will show the ControlPlane nodes in NotReady state.
  • When running kubectl get machine -n <TKC_NAMESPACE> from SupervisorCluster, the newly deployed ControlPlane node will show Provisioned status instead of Running status.
  • From TKC context, running kubectl get nodes will show the new CP node joined to the cluster but in a NotReady status.
  • Running kubectl get pods -A on the TKC context will report some pods in ImagePullBackOff
  • From Supervisor Cluster, describing the ClusterBootstrap object on the TKC namespace will show the kapp-controller is invalid:

kubectl describe clusterbootstrap -n <TKC_NAMESPACE> <TKC_NAME>

Example:

kubectl describe clusterbootstrap -n test-namespace test-cluster

  Kapp:
    Ref Name:  kapp-controller.tanzu.vmware.com.0.41.5+vmware.1-tkg.1
    Values From:
      Provider Ref:
        API Group:  run.tanzu.vmware.com
        Kind:       KappControllerConfig
        Name:       cluster-default-kapp-controller-package
  Paused:           false

Message: kapp: Error: update deployment/kapp-controller (apps/v1) namespace: tkg-system: Updating resource deployment/kapp-controller (apps/v1) namespace: tkg-system: API server says: Deployment.apps "kapp-controller" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"kapp-controller", "kapp.k14s.io/app":"1695785206664433910"}, MatchExpressions []v1.LabelSelectorRequirement(nil)}: field is immutable (reason: Invalid)
Status: True
Type: Kapp-Controller-ReconcileFailed
Resolved TKR: v1.23.15---vmware.1-tkg.4

Environment

VMware vSphere 8.0 with Tanzu

Cause

Devops users can extend the capability of TKG 1.0 workload clusters by installing addons from TKG standard addons package using tanzu-cli. For this extensibility, devops users are required to install kapp controller on the TKC.

TKG 2.0 clusters by default have kapp controller installed in them. Core as well as standard addons are installed as carvel packages in the new ClassyClusters. When devops users upgrade from 1.0 to 2.0 TKr, kapp-controller is installed on the TKC (as it is migrated to ClassyCluster), causing a conflict in package ownership.

Resolution

VMware engineering team is aware of this issue and is working on a resolution. In the meantime, please use the following workaround.

Workaround:
The following steps require SSH access to the Supervisor ControlPlane VM's. This workaround should be carried out with VMware support engineers to ensure system critical resources are not adversely impacted, please reference the following KB for specifics on this process: https://knowledge.broadcom.com/external/article?legacyId=90194

1. SSH into one of the SupervisorControlPlane VM's. This is required as only the kubernetes-admin user has privileges to modify PackageInstall resources on the Supervisor Cluster.

2. Create a file with the following content, named kapp-edit-ytt.yaml

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata": {"name": "kapp-controller", "namespace":"tkg-system"}})
---
metadata:
  annotations:
    #@overlay/match missing_ok=True
    kapp.k14s.io/update-strategy: fallback-on-replace

3. From the above file create a secret in the namespace of the TKC:

kubectl create secret generic kapp-edit-ytt --from-file=kapp-edit-ytt.yaml -n <TKC_NAMESPACE>

Example:

kubectl create secret generic kapp-edit-ytt --from-file=kapp-edit-ytt.yaml -n test-namespace


4. Edit the pkgi resource created for kapp-controller in the namespace of the TKC:

kubectl edit pkgi <TKC_NAME>-kapp-controller -n <TKC_NAMESPACE>

Example:

kubectl edit pkgi test-cluster-kapp-controller -n test-namespace


5. Add the annotation "ext.packaging.carvel.dev/ytt-paths-from-secret-name.0: kapp-edit-ytt"  to the pkgi resource.