Upgrading TKC from v1.28.7 (Legacy TKr) to v1.29.4 (Non-legacy KR) is stuck and not progressing.
While connected to the Supervisor cluster context, the following symptoms are observed:
# kubectl get tkc -A
NAMESPACE NAME CONTROL PLANE WORKER KUBERNETES RELEASE NAME AGE READY
<namespace> <cluster name> # # v1.29.4---vmware.3-fips.1-tkg.1 ###d False
# kubectl get cluster -A
NAMESPACE NAME CLUSTERCLASS PHASE AGE VERSION
<namespace> <cluster name> builtin-generic-v#.#.# Provisioned ###d v1.28.7+vmware.1-fips.1
# kubectl -n <namespace> describe tkc <cluster name>
Conditions:
Last Transition Time: YYYY-MM-DDTHH:MM:SSZ
Message: Error in fetching ClusterBootstrap
Reason: ClusterBootstrapFailed
Severity: Warning
Status: False
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Warning ##s (x## over #m##s) svc-tkg-domain-c<id>/svc-tkg-domain-c<id>-tkg-controller/tanzukubernetescluster-spec-controller found pre-existing kapp-controller in the workload cluster before initiating upgrade; requires user remediation
kubectl -n svc-tkg-domain-c<id> logs deployment/vmware-system-tkg-controller-manager
E0701 HH:MM:SS.sssss 1 regeneratetkrdata_controller.go:97] "regenerating TKR_DATA" err=
Could not resolve KR/OSImage
Missing compatible KR/OSImage for the cluster
Control Plane, filters: {k8sVersionPrefix: v1.28.7+vmware.1-fips.1, osImageSelector: os-name=photon,tkr.tanzu.vmware.com/standard}
MachineDeployment <MachineDeployment NAME>, filters: {k8sVersionPrefix: v1.28.7+vmware.1-fips.1, osImageSelector: os-name=<os>}vCenter 8.0u3
vSphere Supervisor
VKS 3.4.0 or lower
The upgrade process from legacy KR to non-legacy KR fails when an existing kapp-controller is present in the legacy TKC.
Kapp-Controller could be optionally manually installed in workload clusters on legacy KRs.
All non-legacy KRs will have Kapp-Controller included and automatically installed in the workload cluster.
As a result, if Kapp-Controller was manually installed in the workload cluster before upgrading to a non-legacy KR, this pre-existing kapp-controller issue will appear.
This issue indicates that the system installed Kapp-Controller as part of non-legacy KRs is failing to install because it is trying to assume ownership of kapp-controller owned objects that are currently owned by the manually installed kapp-controller.
Issue Sequence:
KR v1.28.7 is the latest and final legacy release.
After applying the workaround in this KB, this issue will not re-appear in subsequent upgrades.
This workaround involves identifying if a clusterbootstrap exists or not and applying an annotation to allow for transferring ownership from the manually installed kapp-controller to the system installed kapp-controller.
kubectl get clusterbootstrap -n <affected workload cluster namespace>
Create the YAML file for the placeholder clusterbootstrap:
Note: This YAML assumes that the desired KR version is v1.29.4.
cat > placeholder-clusterbootstrap.yaml << EOF
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: ClusterBootstrap
metadata:
name: "<affected workload cluster>"
namespace: "<affected workload cluster namespace>"
annotations:
tkg.tanzu.vmware.com/add-missing-fields-from-tkr: "v1.29.4---vmware.3-fips.1-tkg.1"
spec:
paused: true
EOFkubectl apply -f placeholder-clusterbootstrap.yaml
kubectl get -n <affected workload cluster namespace> clusterbootstrap <affected workload cluster>
Define a temporary timestamp variable:
CREATION_TIMESTAMP=$(kubectl get clusterbootstrap <affected workload cluster> -n <affected workload cluster namespace> -o jsonpath='{.metadata.creationTimestamp}')
kubectl patch clusterbootstrap <affected workload cluster> -n <affected workload cluster namespace> --subresource=status --type='merge' --patch="{\"status\":{\"conditions\":[{\"status\":\"False\",\"type\":\"Kapp-Controller-Workaround\",\"lastTransitionTime\":\"$CREATION_TIMESTAMP\"}]}}"
kubectl -n <affected workload cluster namespace> get machines | grep <affected workload cluster>Confirm that the clusterbootstrap is paused:
kubectl -n <affected workload cluster namespace> get clusterbootstrap <affected workload cluster> -o yaml | grep -i pause
paused: true
kubectl -n <affected workload cluster namespace> patch clusterbootstrap <affected workload cluster> --type=merge --patch '{"spec":{"paused":false}}' This will generate a kapp-controller for the non-legacy KR.
The below steps will be performed from the Supervisor cluster context.
kubectl -n <affected workload cluster namespace> get pkgi <affected workload cluster>-kapp-controller
NAME PACKAGE NAME PACKAGE VERSION DESCRIPTION
<affected workload cluster>-kapp-controller kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware Reconcile failed: Error (see .status.usefulErrorMessage for details)
Create a secret resource YAML file called "kapp-edit-ytt":
cat > kapp-edit-ytt.yaml <<EOF
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata": {"name": "kapp-controller", "namespace":"tkg-system"}})
---
metadata:
annotations:
#@overlay/match missing_ok=True
kapp.k14s.io/update-strategy: fallback-on-replace
EOF
kubectl -n <affected workload cluster namespace> create secret generic kapp-edit-ytt --from-file=kapp-edit-ytt.yaml
kubectl -n <affected workload cluster namespace> get secrets kapp-edit-ytt
NAME TYPE DATA AGE
kapp-edit-ytt Opaque 1 #s
kubectl -n <affected workload cluster namespace> annotate pkgi <affected workload cluster>-kapp-controller ext.packaging.carvel.dev/ytt-paths-from-secret-name.0=kapp-edit-ytt
kubectl -n <affected workload cluster namespace> get pkgi <affected workload cluster>-kapp-controller
NAME PACKAGE NAME PACKAGE VERSION DESCRIPTION
<affected workload cluster>-kapp-controller kapp-controller.tanzu.vmware.com 0.50.0+vmware.1-tkg.1-vmware Reconcile succeeded
kubectl -n <affected workload cluster namespace> get machines | grep <affected workload cluster>The TKC will become Ready True once all nodes are stabilized on the desired version:
kubectl -n <affected workload cluster namespace> get tkc <affected workload cluster>
NAMESPACE NAME CONTROL PLANE WORKER KUBERNETES RELEASE NAME AGE READY
<affected workload cluster namespace> <affected workload cluster> # # v1.29.4---vmware.3-fips.1-tkg.1 ###d True
kubectl -n <affected workload cluster namespace> annotate pkgi <affected workload cluster>-kapp-controller ext.packaging.carvel.dev/ytt-paths-from-secret-name.0-
kubectl -n <affected workload cluster namespace> delete secret kapp-edit-ytt
rm kapp-edit-ytt.yaml
rm placeholder-clusterbootstrap.yaml