kubelet logs:
"Disk usage on image filesystem is over the high threshold, trying to free bytes down to the low threshold"
"Image garbage collection failed once. Stats initialization may not have completed yet" & "err="failed to garbage collect required amount of images."
containerd and kubelet services does not reduce disk usageTCA 3.2
TKG 2.5.2
The root cause is a stale or hung backup process (external to VMware components) that retains active file handles on deleted temporary files.
While the backup application has technically "deleted" the files (often located in /tmp/cbur/), OS unable to reclaim the disk space because the process responsible for them is still active/hung.
Engage backup solution vendor to investigate why the backup jobs are failing or hanging on the worker nodes.
To resolve the immediate disk space issue, the stale process must be terminated.
Log in to the affected worker node as root.
Run the following command to identify deleted files that are still being held open by a process: find /proc/*/fd -ls 2>/dev/null | grep "(deleted)"
Look for output similar to: /tmp/cbur/20XX11272XXXXX_LOCAL_xxxx_admincli_volume.tar.gz (deleted)
Note the Process ID (PID) from the output of the command above.
Terminate the stale process to release the file handles:kill -9 <PID>
Verify that filesystem utilization has dropped to expected levels: df -h