TKGI images not reloaded after Disk pressure
search cancel

TKGI images not reloaded after Disk pressure

book

Article ID: 382757

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated (TKGi) VMware Tanzu Kubernetes Grid Integrated Edition (Core) VMware Tanzu Kubernetes Grid Integrated Edition 1.x

Issue/Introduction

TKGi worker nodes experiencing disk pressure and unused images are deleted by garbage collection.

I1104 08:34:01.389267    8986 image_gc_manager.go:391] "Attempting to delete unused images"

The disk-pressure-watch script detects the node has experienced Disk Pressure and reloads the images. However it does not reload all the deleted images.

Environment

TKGi 1.20

Cause

It has been observed that the disk-pressure-watch script can exit while reloading images, see /var/vcap/monit/monit.log

[UTC Nov  4 08:38:58] error    : 'disk-pressure-watch' process is not running
[UTC Nov  4 08:38:58] info     : 'disk-pressure-watch' trying to restart
[UTC Nov  4 08:38:58] info     : 'disk-pressure-watch' start: /var/vcap/jobs/disk-pressure-watch/bin/ctl

Resolution

The images can be manually reloaded by running relevant script for the missing image

# bosh -d <service-instance-ID> ssh <worker-node>

Switch to root user and execute the relevant script from the list below

# sudo -i
# /var/vcap/jobs/load-images/bin/post-start
# /var/vcap/jobs/telemetry-agent-image/bin/post-start
# /var/vcap/jobs/wavefront-proxy-images/bin/post-start
# /var/vcap/jobs/vrops-errand/bin/post-start
# /var/vcap/jobs/sink-resource-images/bin/post-start
# /var/vcap/jobs/load-antrea-images/bin/post-start
# /var/vcap/jobs/csi-images/bin/post-start