This article will provide a likely root cause for the mentioned symptoms and a way forward.
Symptoms:
For example:
$ kubectl get tkc,machine -A
NAMESPACE NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE
test tanzukubernetescluster.run.tanzu.vmware.com/tkc-test 1 1 v1.23.8---vmware.3-tkg.1 11m False True [v1.23.15+vmware.1-tkg.4 v1.24.9+vmware.1-tkg.4 v1.24.11+vmware.1-fips.1-tkg.1]
NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
test machine.cluster.x-k8s.io/tkc-test-65r5k-mjtg2 tkc-test tkc-test-65r5k-mjtg2 vsphere://42124fc0-4c9b-41df-0458-9f41e371a223 Running 11m v1.21.6+vmware.1
test machine.cluster.x-k8s.io/tkc-test-servicesnodepool-hvngk-864748ff77-ngncq tkc-test tkc-test-servicesnodepool-hvngk-864748ff77-ngncq vsphere://421258ae-19e5-0e9a-0a6e-e45051bb843c Running 11m v1.21.6+vmware.1
Note: the issue was observed on vCenter 7. It is possible that the same issue could occur on vCenter 8, in which case instead of wcpmachinetemplates, there would be vspheremachinetemplates.
Example:
If we have a tkc cluster on 1.23.8 version, the admission webhook will block an upgrade to any 1.25.x or higher version. It will allow upgrades to all compatible versions listed under "Updates Available" field:
$ kubectl get tkc -A
NAMESPACE NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE
test tkc 1 1 v1.23.8---vmware.3-tkg.1 20h True True [v1.23.15+vmware.1-tkg.4 v1.24.9+vmware.1-tkg.4 v1.24.11+vmware.1-fips.1-tkg.1]
$ kubectl edit -n test tkc tkc
error: tanzukubernetesclusters.run.tanzu.vmware.com "tkc" could not be patched: admission webhook "default.validating.tanzukubernetescluster.run.tanzu.vmware.com" denied the request: version upgrade not compatible with rules
Example:
It is suggested to manually delete all TKC clusters left in an inconsistent state due to the incomplete initial upgrade. After the deletion, all associated wcpmachinetemplates/vspheremachinetemplates will get automatically deleted and memory pressure on Supervisor VMs will decrease.