Kubelet disk pressure not reported and garbage collector not triggered which results in pods failing to start
search cancel

Kubelet disk pressure not reported and garbage collector not triggered which results in pods failing to start

book

Article ID: 298619

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

When running a cluster normal operations, some workers are getting full persistent volume up to 91 % without triggering  the garbage collector.

When we run 
docker image prune -a the unused images are  removed and disk space is  recovered 

Same if kubelet service is restarted
images are cleaned as well 

Running 
kubectl describe nodes NODEID | grep -i pres

 Shows no sign of disk pressure

When  ssh to the worker
docker system df
and
df -h grep /var/vcap/store

shows high reclaimable  amount of images and free space less than 9 %  

Running:
kubectl proxy
then connect on localhost
http://127.0.0.1:8001/api/v1/nodes/c6a0d708-f97c-4ff9-a1a3-6256c2558fbc/proxy/configz
Where c6a0d708-f97c-4ff9-a1a3-6256c2558fbc is node id you would like to get details about (kubectl get nodes)
This part shows it default values:
evictionHard
imagefs.available"15%"
memory.available"100Mi"
nodefs.available"10%"
nodefs.inodesFree"5%"
evictionPressureTransitionPeriod"5m0s"

This is evidence the system is set to evict nodes if there is disk pressure as well it will trigger Garbage collector.

Environment

Product Version: 1.7

Resolution

Cause of the problem: This issue might happen randomly, however it is related to cadvisor metrics does not contain details about containers and disk pressure, as result kubelet is never aware of the problem.

Workaround the problem is to restart kubelet on the affected workers. and check if the disk pressure will be present. 

Expected result garbage collector will trigger and remove the unused images, free space back to expected percentage.

Problem is observed in 1.7.x and 1.8.x currently there is no fix, when fix is available the kb will be updated.