TKGI v1.23.0 changes imagefs directory

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

When upgrading to TKGI v1.23 the directory for imagefs has changed.

Environment

TKGi v1.23.0

Cause

To check the directory location run command crictl imagefsinfo on worker vm:

TKGI v1.22.x

crictl imagefsinfo
{
  "status": {
    "imageFilesystems": [
      {
        "timestamp": "1762182059818853993",
        "fsId": {
          "mountpoint": "/var/vcap/store/containerd/io.containerd.snapshotter.v1.overlayfs"
},

TKGI v1.23.0

 crictl imagefsinfo
{
  "status": {
    "timestamp": "1762264051739809869",
    "fsId": {
      "mountpoint": "/var/vcap/store/io.containerd.snapshotter.v1.overlayfs"
    },

This change does not remove or clean up the old directory. This results in old directory persisting and not being cleaned up and new directory is loaded with data and images, in turn using twice the amount of disk. This potentially can create disk pressure issue which results in pod eviction.

Resolution

VMware Tanzu recommends that you wait for TKGi v1.23.1 patch release where the issue is resolved. If that is not possible, then please choose Scenario 1 or 2 below.

Please note that if only some of the clusters have been or must be upgraded to v1.23.0, the remaining clusters can remain on v1.22.x. And all clusters can then be upgraded to v1.23.1 once available.

Scenario 1.

If the cluster has been upgraded to TKGI v1.23.0 the workaround is to clean up the directory manually:

To confirm the old directory is not being used, run below. If no output its safe to remove. If there is output from this command, please open support case with Broadcom support.
sudo lsof +D /var/vcap/store/containerd

To check disk usage:
sudo du -sh /var/vcap/store/containerd*
7.1G /var/vcap/store/containerd

Cleanup stale data:
sudo rm -rf /var/vcap/store/containerd

Scenario 2.

If the tile has been upgraded but clusters are not upgraded yet please follow steps from this KB TKGI v1.23.0 upgrade leads to changes in the imagefs directory that can cause disk pressure and pod eviction

Additional Information

Expected Fix in TKGi 1.23.1

The new patch will restore the location of the folder to original path: /var/vcap/store/containerd

The cleanup step below is required if you had upgraded to v1.23.0 and then to v1.23.1.

Post upgrade cleanup

After the upgrade to v1.23.1, the following steps will need to be run to clean up, for each of the directories listed below in the /var/vcap/store

drwxr-xr-x 4 root root 4096 Sep 18 02:08 io.containerd.content.v1.content
drwxr-xr-x 4 root root 4096 Sep 18 02:09 io.containerd.grpc.v1.cri
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.grpc.v1.introspection
drwx--x--x 2 root root 4096 Sep 18 02:08 io.containerd.metadata.v1.bolt
drwx--x--x 2 root root 4096 Sep 18 02:08 io.containerd.runtime.v1.linux
drwx--x--x 3 root root 4096 Sep 18 02:09 io.containerd.runtime.v2.task
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.blockfile
drwx------ 2 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.btrfs
drwx------ 3 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.native
drwx------ 3 root root 4096 Sep 18 02:08 io.containerd.snapshotter.v1.overlayfs

Do not remove /var/vcap/store/containerd

The commands below are for the/var/vcap/store/io.containerd.content.v1.content.

To confirm the old directory is not being used, run below. If no output its safe to remove. If there is output from this command, please open support case with Broadcom support.
sudo lsof +D /var/vcap/store/io.containerd.content.v1.content

To check disk usage:
sudo du -sh /var/vcap/store/io.containerd.content.v1.content*
2.1G /var/vcap/store/io.containerd.content.v1.content

Cleanup stale data:
sudo rm -rf /var/vcap/store/io.containerd.content.v1.content