In a TKG Service guest cluster, pods failed to restart after updating the imagerepository for the tanzu-standard package repository. Despite restarting kapp-controller, its pod age remained unchanged, and new pods failed to appear even when manually deleted. The related PackageInstall on the Supervisor Cluster entered a ReconcileFailed state with timeout errors referencing Carvel APIs and kapp-controller deployment. The issue affected multiple pods, suggesting a systemic problem with control plane reconciliation.
VMware vSphere Kubernetes Service
Logs from all guest cluster control plane nodes showed the following recurring error for kube-controller-manager:
error retrieving resource lock kube-system/kube-controller-
This prevented the controller manager from acquiring the leader election lock, disabling its ability to reconcile workloads, restart pods, or create new ones. As a result, deployments stalled and controller-based functions failed silently despite the ReplicaSet and Deployment objects showing readiness.
To recover from the control plane failure: