During disk pressure events on a worker node (when system disk usage exceeds 85%), Kubelet image garbage collector activates and remove images from the file system until disk pressure subsides (by default under 85%). In some cases, this may result in core images, like CoreDNS and the Metrics Server, being deleted. If either of these pods is later scheduled to this worker node, they may fail to start due to the missing images (ImagePullBackoff).
Each worker node includes a job called disk-pressure-watch, located at /var/vcap/jobs/disk-pressure-watch, which is responsible for detecting disk pressure and reloading core images onto the node. This script monitors the disk pressure status on the worker node, and when disk pressure is detected (DiskPressure status is True), it triggers the script execution.
You will notice that disk-pressure-watch (var/vcap/sys/log/disk-pressure-watch/disk-pressure-watch.stdout.log) never reports pressure:
Sleeping until DiskPressure condition occurs on XXX.XXX.XXX.XXX
Even though Kubelet garbage collector (var/vcap/sys/log/kubelet/kubelet.stderr.log) has been executed:
I1014 10:30:08.095150 8836 image_gc_manager.go:323] "Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold" usage=85 highThreshold=85 amountToFree=4356579328 lowThreshold=80I1014 10:30:08.110869 8836 image_gc_manager.go:400] "Removing image to free bytes" imageID="sha256:a96c6437238723728dhajs8dd082b360d8e054d13c44df622d9197df63efea8e" size=296235462I1014 10:30:12.855969 8836 image_gc_manager.go:400] "Removing image to free bytes" imageID="sha256:3f51440822fd2ab948ff047650c955a0ejfdh7ydsjh28idha92j960807558270" size=155378306I1014 10:30:12.930358 8836 image_gc_manager.go:400] "Removing image to free bytes" imageID="sha256:c0a9e718128026a3859f940d36a40e87w8hfcwc9hccw9w9dwjcchw979w8ew151" size=208980302I1014 10:30:13.038632 8836 image_gc_manager.go:400] "Removing image to free bytes" imageID="sha256:9a3c49cecfc89a09cd8d613adusdhdjhsdw8d78sdh3ie09dwdw208e2f1b91a81" size=145797095
Kubelet starts two threads monitoring disk usage, thresholds for both are 85% usage by default.
There is possibility, image garbage collector gets triggered and starts removing old images, but eviction manager is not notified by signal. As the result, disk-pressure-watch can not detect DiskPressure=true event and would not reload the deleted system component images.
As quick work around, you can reload the missing images with /var/vcap/jobs/load-images/bin/post_start on a worker VM.
To resolve this permanently, you could update the plan associated with the cluster to include the imagefs.available configuration flag in Kubelet Customization. Please follow the below steps:
imagefs.available=XX%" flag, make XX larger than 15, for example 20, so that eviction manager will be triggered prior to image garbage collector. tkgi upgrade-cluster service_instance_xxx" for each cluster after applying the configuration, without using the Upgrade All Clusters errand. References: