Guest Cluster Pods Fail to Restart Due to kube-controller-manager Lock Access Failure after updating the imagerepository

search cancel

Guest Cluster Pods Fail to Restart Due to kube-controller-manager Lock Access Failure after updating the imagerepository

book

Article ID: 400130

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In a TKG Service guest cluster, pods failed to restart after updating the imagerepository for the tanzu-standard package repository. Despite restarting kapp-controller, its pod age remained unchanged, and new pods failed to appear even when manually deleted. The related PackageInstall on the Supervisor Cluster entered a ReconcileFailed state with timeout errors referencing Carvel APIs and kapp-controller deployment. The issue affected multiple pods, suggesting a systemic problem with control plane reconciliation.

Environment

VMware vSphere Kubernetes Service

Cause

Logs from all guest cluster control plane nodes showed the following recurring error for kube-controller-manager:

error retrieving resource lock kube-system/kube-controller-manager: Unauthorized

This prevented the controller manager from acquiring the leader election lock, disabling its ability to reconcile workloads, restart pods, or create new ones. As a result, deployments stalled and controller-based functions failed silently despite the ReplicaSet and Deployment objects showing readiness.

Resolution

To recover from the control plane failure:

SSH into each control plane node of the affected guest cluster.
Navigate to /etc/kubernetes/manifests/.
Temporarily move both of the following static pod manifest files out of the directory:
- kube-controller-manager.yaml
- kube-scheduler.yaml
Wait for the kubelet to terminate the pods.
Move both files back into /etc/kubernetes/manifests/.
Wait for the components to redeploy.
Verify that:
- The kube-controller-manager and kube-scheduler pods are healthy.
- Pods that were previously stuck begin to restart.
- PackageInstall status transitions out of ReconcileFailed.

Feedback

thumb_up Yes

thumb_down No