Guest Cluster Pods Fail to Restart Due to kube-controller-manager Lock Access Failure after updating the imagerepository
search cancel

Guest Cluster Pods Fail to Restart Due to kube-controller-manager Lock Access Failure after updating the imagerepository

book

Article ID: 400130

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In a TKG Service guest cluster, pods failed to restart after updating the imagerepository for the tanzu-standard package repository. Despite restarting kapp-controller, its pod age remained unchanged, and new pods failed to appear even when manually deleted. The related PackageInstall on the Supervisor Cluster entered a ReconcileFailed state with timeout errors referencing Carvel APIs and kapp-controller deployment. The issue affected multiple pods, suggesting a systemic problem with control plane reconciliation.

Environment

VMware vSphere Kubernetes Service

Cause

Logs from all guest cluster control plane nodes showed the following recurring error for kube-controller-manager:

error retrieving resource lock kube-system/kube-controller-manager: Unauthorized

This prevented the controller manager from acquiring the leader election lock, disabling its ability to reconcile workloads, restart pods, or create new ones. As a result, deployments stalled and controller-based functions failed silently despite the ReplicaSet and Deployment objects showing readiness.

Resolution

To recover from the control plane failure:

  1. SSH into each control plane node of the affected guest cluster.
  2. Navigate to /etc/kubernetes/manifests/.
  3. Temporarily move both of the following static pod manifest files out of the directory:
    • kube-controller-manager.yaml
    • kube-scheduler.yaml
  4. Wait for the kubelet to terminate the pods.
  5. Move both files back into /etc/kubernetes/manifests/.
  6. Wait for the components to redeploy.
  7. Verify that:
    • The kube-controller-manager and kube-scheduler pods are healthy.
    • Pods that were previously stuck begin to restart.
    • PackageInstall status transitions out of ReconcileFailed.