When upgrading to TKGI v1.23 the directory for imagefs has changed.
TKGi v1.23.0
To check the directory location run command crictl imagefsinfo on worker vm:
TKGI v1.22.x
crictl imagefsinfo
{
"status": {
"imageFilesystems": [
{
"timestamp": "1762182059818853993",
"fsId": {
"mountpoint": "/var/vcap/store/containerd/io.containerd.snapshotter.v1.overlayfs"
},
TKGI v1.23.0
crictl imagefsinfo
{
"status": {
"timestamp": "1762264051739809869",
"fsId": {
"mountpoint": "/var/vcap/store/io.containerd.snapshotter.v1.overlayfs"
},
This change does not remove or clean up the old directory. This results in old directory persisting and not being cleaned up and new directory is loaded with data and images, in turn using twice the amount of disk. This potentially can create disk pressure issue which results in pod eviction.
VMware Tanzu recommends that you wait for TKGi v1.23.1 patch release where the issue is resolved. If that is not possible, then please choose Scenario 1 or 2 below.
Please note that if only some of the clusters have been or must be upgraded to v1.23.0, the remaining clusters can remain on v1.22.x. And all clusters can then be upgraded to v1.23.1 once available.
Scenario 1.
If the cluster has been upgraded to TKGI v1.23.0 the workaround is to clean up the directory manually:
To confirm the old directory is not being used, run below. If no output its safe to remove. If there is output from this command, please open support case with Broadcom support.
sudo lsof +D /var/vcap/store/containerd
To check disk usage:
sudo du -sh /var/vcap/store/containerd*
7.1G /var/vcap/store/containerd
Cleanup stale data:
sudo rm -rf /var/vcap/store/containerd
Scenario 2.
If the tile has been upgraded but clusters are not upgraded yet please follow steps from this KB TKGI v1.23.0 upgrade leads to changes in the imagefs directory that can cause disk pressure and pod eviction
Expected Fix in TKGi 1.23.1
The new patch will restore the location of the folder to original path: /var/vcap/store/containerd
The cleanup step below is required if you had upgraded to v1.23.0 and then to v1.23.1.
Post upgrade cleanup
After the upgrade to v1.23.1, the following steps will need to be run to clean up, for each of the directories listed below in the /var/vcap/store
drwxr-xr-x 4 root root 4096 Sep 18 02:08 io.containerd.content.v1.content
drwxr-xr-x 4 root root 4096 Sep 18 02:09 io.containerd.grpc.v1.cri
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.grpc.v1.introspection
drwx--x--x 2 root root 4096 Sep 18 02:08 io.containerd.metadata.v1.bolt
drwx--x--x 2 root root 4096 Sep 18 02:08 io.containerd.runtime.v1.linux
drwx--x--x 3 root root 4096 Sep 18 02:09 io.containerd.runtime.v2.task
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.blockfile
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.btrfs
drwx------ 3 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.native
drwx------ 3 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.overlayfs
The commands below are for the/var/vcap/store/io.containerd.content.v1.content.
To confirm the old directory is not being used, run below. If no output its safe to remove. If there is output from this command, please open support case with Broadcom support.
sudo lsof +D /var/vcap/store/io.containerd.content.v1.content
To check disk usage:
sudo du -sh /var/vcap/store/io.containerd.content.v1.content*
2.1G /var/vcap/store/io.containerd.content.v1.content
Cleanup stale data:
sudo rm -rf /var/vcap/store/io.containerd.content.v1.content