TKGI v1.23.0 upgrade leads to changes in the imagefs directory that can cause disk pressure and pod eviction
search cancel

TKGI v1.23.0 upgrade leads to changes in the imagefs directory that can cause disk pressure and pod eviction

book

Article ID: 417891

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

In TKGI v1.23.0, the directory for imagefs has changed.  When upgrading from v1.22.x to v1.23.0, the old imagefs directory does not get deleted. This results in the old directory persisting as it was, and the new directory is loaded with data and images, which effectively will double the disk usage unnecessarily. This potentially can create disk pressure issue which could result in pod eviction.

Environment

TKGi 1.23.0

Cause

The imagefs directory location has changed in TKGI v1.23.0.

To check the directory location, run command crictl imagefsinfo on a worker VM:

TKGI v1.22.x

crictl imagefsinfo
{
  "status": {
    "timestamp": "1762182059818853993",
    "fsId": {
      "mountpoint": "/var/vcap/store/containerd/io.containerd.snapshotter.v1.overlayfs"
    },

TKGI v1.23.0 

 crictl imagefsinfo
{
  "status": {
    "timestamp": "1762264051739809869",
    "fsId": {
      "mountpoint": "/var/vcap/store/io.containerd.snapshotter.v1.overlayfs"
    },

 

Resolution

VMware Tanzu recommends that you wait for TKGi v1.23.1 patch release where the issue is resolved. If that is not possible, then please choose Scenario 1 or 2 below.

Please note that if only some of the clusters have been or must be upgraded to 1.23.0, the remaining clusters can remain on 1.22.x. And all clusters can then be upgraded to 1.23.1 once available.

Scenario 1: If you have already upgraded a cluster to TKGI v1.23

Refer to the TKGI v1.23.0 changes imagefs directory KB.

Scenario 2: If you have upgraded to TKGI tile to v1.23 but the clusters are pending upgrade (not yet upgraded), proceed with applying the 'os-conf' patch and then perform the upgrade.

For additional information about how to apply the 'os-conf' patch, refer to the How to use BOSH os-conf release to run script at different deployment stage KB.

Example of Runtime Config setup for targeting specific service instance before upgrade:

Create a runtime.yml file:

releases:
- name: "os-conf"
  version: "23.0.0"
addons:
- name: containerd-configuration
  jobs:
  - name: pre-start-script
    release: os-conf
    properties:
      script: |-
        #!/bin/bash
        CONTAINERD_CONFIG=/var/vcap/jobs/containerd/config/config.toml
        if [ -e "$CONTAINERD_CONFIG" ]; then
          echo "Changing root dir of containerd"
          sudo sed -i '2c\root="/var/vcap/store/containerd"' $CONTAINERD_CONFIG
          echo "Done"
        fi
  include:
    deployments: [service-instance_XXX, service-instance_YYY]    # Optional, you can define which deployments (TKGi clusters) this runtime config will be applied to.
    instance_groups: [worker]

NOTE: If you don't specify any deployments, then the runtime-config will be applied to any deployment including new clusters.

Additional Information

Expected Fix in TKGi 1.23.1 

The new patch will restore the location of the folder to original path: /var/vcap/store/containerd/io.containerd.snapshotter.v1.overlayfs

Therefore prior upgrade to 1.23.1 delete the os-conf created.