TKGI internal pods in ErrImagePull state after disk pressure event
search cancel

TKGI internal pods in ErrImagePull state after disk pressure event

book

Article ID: 335092

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
In air-gapped environment, the TKGI worker has experienced a disk pressure.
After the garbage collection triggered for cleaning up the local container images to reclaim disk space, some TKGi internal pods went to ErrImagePull state
error from kubelet.stderr.log in the worker:
 E1222 07:17:24.480877    9777 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"vsphere-webhook\" with ErrImageNeverPull: \"Container image \\\"gcr.io/cloud-provider-vsphere/csi/release/syncer:v2.5.4\\\" is not present with pull policy of Never\"" pod="vmware-system-csi/vsphere-csi-webhook-596f8b679c-pn2sz" podUID=5cde00e7-1b16-4ab9-88dd-2a9bb5ce668a


Cause

If for any reason TKGI internal pods were not running during the garbage collection execution in an air-gapped environment, the internal container image would be considered unused and got deleted from the worker node.

Resolution

Recommendation is to monitor node disk utilization and free up/increase disk capacity to avoid garbage collection execution.

Workaround:
Restarting the related jobs on the affected worker node will reload the internal container images.
Once you have identified the internal pod name that is experiencing the ErrImagePull state, you can manually run the script from the worker accordingly.

Here some internal container job path:
TKGi related CSI images can be reloaded into the affected worker node by executing the below script:
/var/vcap/jobs/csi-images/bin/post-start

TKGi related Sink images can be reloaded into the affected worker node by executing the below script:
/var/vcap/jobs/sink-resources-images/bin/post-start

TKGi related Telemetry images can be reloaded into the affected worker node by executing the below script:
/var/vcap/jobs/telemetry-agent-image/bin/post-start