During the upgrade from 1.26 to 1.27, an Image Pull error occurred | localhost:5000/tkg/sandbox/packages/core/vsphere-pv-csi
search cancel

During the upgrade from 1.26 to 1.27, an Image Pull error occurred | localhost:5000/tkg/sandbox/packages/core/vsphere-pv-csi

book

Article ID: 406200

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

You’re upgrading your Supervisor and Guest Clusters from v1.26 to v1.27, with plans to continue testing through v1.28 and 1.29.x. During the upgrade from v1.26 to v1.27, the CSI pods fail to start, and your applications go offline. All CSI pods report repeated image pull failures.

In the kubelet logs, you see the following error:

Error: ErrImagePull

Failed to pull image “localhost:5000/tkg/sandbox/packages/core/vsphere-pv-csi@sha256:SHA”:

rpc error: code = NotFound desc = failed to pull and unpack image “localhost:5000/tkg/sandbox/packages/core/vsphere-pv-csi@sha256:SHA”: failed to resolve reference

Attempts to manually place the CSI image into the registry are unsuccessful. The image you provide lacks the proper tags and digests, so Kubernetes is still unable to resolve it. The CSI app remains stuck and does not deploy.

Cause

Your CSI app and its associated PKGI are both in a paused state. The CSI app references image version 3.1.0, while the PKGI reports version 3.2.0 with reconcileSucceeded=true. However, neither is actively reconciling.

Based on state inspection and upgrade behavior, you determine that the pause was introduced manually during or before the 1.26 lifecycle. The paused state prevents controllers from finding the connect image on the worker nodes. This causes your controller pods to remain unscheduled, and CSI functionality fails.

When you attempt to confirm the origin of the pause, you find that kube-apiserver audit logs are missing from the WCP support bundle. This prevents definitive tracing of the event, but you have a high degree of certainty that a manual pause was the root cause.

Resolution

Manually unpause the CSI app and PKGI object. This action allows reconciliation to complete successfully.

kubectl -n <namespace> patch app <app name> -p '{"spec":{"paused":false}}' --type=merge

kubectl -n <namespace> patch pkgi <pkgi name> -p '{"spec":{"paused":false}}' --type=merge

After unpausing:

  • The CSI app updates to image version 3.2.0
  • The controller pod enters a Running state
  • The TKC returns to Ready=true
  • Your workloads recover and become available

Confirm that the pause state does not revert during subsequent upgrades and continue to monitor the CSI app’s reconciliation health.