Symptoms:
- After upgrading to vCenter version 7.0U3F Build 20051473, or when using vCenter 7.0U3E Build 19717403, AND when using Supervisor Cluster on builds prior to vsc0.0.17; guest clusters cannot be updated to Tanzu Kubernetes Release (TKR) version v1.23.8---vmware.3-tkg.1
- When attempting to upgrade the TKC, kubectl get tkc will display the current version as v1.23.8---vmware.3-tkg.1, however new cluster nodes will not be created to replace the existing nodes running on the same version before the attempted upgrade.
- TKC will remain functional and report with status READY true.
- The vmware-system-tkg-controller-manager logs will show the below error repeatedly when trying to migrate the CoreDNS component to the version included in the TKR:
# kubectl logs -c manager -n vmware-system-tkg vmware-system-tkg-controller-manager-57bb4d68f6-g7tjw
...
I0211 23:23:09.233662 1 control_plane_sync.go:200] vmware-system-tkg-controller-manager/tanzukubernetescluster-spec-controller/ns01/clusterv1a2 "msg"="Executing rolling update for KubeadmControlPlane" "cluster"="clusterv1a2"
E0211 23:23:09.404734 1 tanzukubernetescluster_controller.go:418] vmware-system-tkg-controller-manager/tanzukubernetescluster-spec-controller/ns01/clusterv1a2 "msg"="Unable to reconcile control plane for cluster" "error"="Unable to sync KubeadmControlPlane for cluster \"clusterv1a2\": admission webhook \"validation.kubeadmcontrolplane.controlplane.cluster.x-k8s.io\" denied the request: KubeadmControlPlane.controlplane.cluster.x-k8s.io \"clusterv1a2-control-plane\" is invalid: spec.kubeadmConfigSpec.clusterConfiguration.dns.imageTag: Forbidden: cannot migrate CoreDNS up to '1.8.6' from '1.8.4': cannot migrate up to '1.8.6' from '1.8.4'" "cluster"="clusterv1a2"
...
- New wcpmachinetemplate objects for the cluster control plane are spawned without deleting the older ones. A count of these objects increases over time:
# kubectl get wcpmachinetemplate
NAME AGE
clusterv1a2-control-plane-4fgtx 43m
clusterv1a2-control-plane-85jr4 42m
clusterv1a2-control-plane-89bxk 15m
clusterv1a2-control-plane-99fxq 5m49s
clusterv1a2-control-plane-gc75d 15m
clusterv1a2-control-plane-hcrtj 32m
clusterv1a2-control-plane-jczz4 40m
clusterv1a2-control-plane-rcbht 2d18h
clusterv1a2-control-plane-t9ccj 35m
clusterv1a2-control-plane-tkdxq 25m
clusterv1a2-control-plane-zbwrz 42m
clusterv1a2-control-plane-zl99m 37m
clusterv1a2-worker-nodepool-1-5f2cp 2d18h
# kubectl edit tkc --------------------> Upgrading TKC version in this edit
tanzukubernetescluster.run.tanzu.vmware.com/clusterv1a2 edited
# kubectl get tkc --------------------> TKC version reports update
NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE
clusterv1a2 3 2 v1.23.8---vmware.3-tkg.1 2d17h True True
# kubectl get machine --------------------> Machine objects are not updated
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
clusterv1a2-control-plane-rfzh9 clusterv1a2 clusterv1a2-control-plane-rfzh9 vsphere://423c1849-dbc6-b8c8-4bbd-fdf5810d7ec0 Running 2d17h v1.22.9+vmware.1
clusterv1a2-control-plane-tlclm clusterv1a2 clusterv1a2-control-plane-tlclm vsphere://423c687f-d025-6eb4-210d-58876f37c971 Running 2d17h v1.22.9+vmware.1
clusterv1a2-control-plane-zjgg8 clusterv1a2 clusterv1a2-control-plane-zjgg8 vsphere://423c7301-fb2f-1795-e07e-2e07581db88b Running 2d17h v1.22.9+vmware.1
clusterv1a2-worker-nodepool-1-mjfsl-79dd67c94-6j7fx clusterv1a2 clusterv1a2-worker-nodepool-1-mjfsl-79dd67c94-6j7fx vsphere://423c7956-9048-a600-3049-70f4a4f65f22 Running 2d17h v1.22.9+vmware.1
clusterv1a2-worker-nodepool-1-mjfsl-79dd67c94-8rx4s clusterv1a2 clusterv1a2-worker-nodepool-1-mjfsl-79dd67c94-8rx4s vsphere://423c43d4-f09c-e039-be7a-155dafc4b70b Running 2d17h v1.22.9+vmware.1