vSphere Kubernetes Cluster is in an error state of TopologyReconcileFailed.
When an upgrade is initiated while the cluster is in this state, the upgrade becomes stuck and cannot progress.
While connected to the Supervisor cluster context, the following symptoms are present:
kubectl describe cluster <cluster name> -n <cluster namespace>
message: 'error computing the desired state of the Cluster topology: failed to apply patches'
reason: TopologyReconcileFailed
severity: Error
status: "False"
type: TopologyReconciled
'error computing the desired state of the Cluster topology: failed to apply patches: failed to generate patches for patch "nodeLabels": failed to generate JSON patches for "<Template>": failed to calculate value for template: failed to render template: "run.tanzu.vmware.com/tkr={{ index (index .TKR_DATA .builtin.controlPlane.version).labels \"run.tanzu.vmware.com/tkr\"}},run.tanzu.vmware.com/kubernetesDistributionVersion={{ index (index .TKR_DATA.builtin.controlPlane.version).labels \"run.tanzu.vmware.com/tkr\" }}\n": template: tpl:1:28: executing "tpl" at <index (index .TKR_DATA .builtin.<nodeObject>.version).labels "run.tanzu.vmware.com/tkr">: error calling index: index of untyped nil'
error computing the desired state of the Cluster topology: failed to apply patches: failed to generate patches for patch "nodeLabels": [failed to generate JSON patches for item with uid "<uid>": failed to calculate value for template: failed to render template: "run.tanzu.vmware.com/tkr={{ index (index .TKR_DATA .builtin.<nodeComponent>.version).labels \"run.tanzu.vmware.com/tkr\" }}
kubectl describe cluster <cluster name> -n <cluster namespace>
kubectl get tkc <cluster name> -n <cluster namespace>
kubectl get kcp,md,machine -n <cluster namespace>
vSphere 7.0 with Tanzu
vSphere 8.0 with Tanzu
This issue can occur on a vSphere Supervisor cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)
There is a misconfiguration or missing TKR_DATA in the TKR_DATA section of the affected vSphere Kubernetes cluster object.
TopologyReconcileFailed state can also occur if one of the node components are incorrectly configured.
This is caused by issues during a cluster upgrade which did not properly populate the TKR_DATA or by manual misconfiguration on the affected node component.
The TKR_DATA section of the affected cluster's cluster YAML or the misconfigured node component will need to be corrected.
Once the cause is corrected, the TopologyReconcileFailed error should clear and the cluster yaml will show TopologyReconciled status "True" state.
Please reach to VMware by Broadcom Technical Support referencing this KB article to resolve this.