TKGS Volume Mounts are not mounted after reboot on vSphere 7.0U2 or earlier

search cancel

TKGS Volume Mounts are not mounted after reboot on vSphere 7.0U2 or earlier

book

Article ID: 323441

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
After a TKGS node is rebooted the custom volume mount is not mounted, causing various issues depending on where the custom mount is. For example if the mount point for a worker is on /var/lib/containerd then kubelet will fail to start the "pause" pod since the embedded pause image is located under the /var/lib/containerd mount point.

Environment

VMware vCenter Server 7.0.x

Cause

This is a known issue with the volume mount feature for any volume mount created before vSphere 7.0U3c and its associated supervisor cluster version.

Resolution

To resolve this issue, update vSphere to 7.0U3 and its associated Supervisor Cluster version.

Workaround:
To workaround this issue, you need to first identify the mount point on the tkc via 'kubectl get tkc <name-of-tkc> -n <supervisor-namespace> -o yaml'

    nodePools:
    - name: workers
      replicas: 1
      storageClass: k8s-policy
      tkr:
        reference:
          name: v1.20.7---vmware.1-tkg.1.7fb9067
      vmClass: best-effort-small
      volumes:
      - capacity:
          storage: 16Gi
        mountPath: /var/lib/containerd
        name: containerd

Then ssh into the node that doesn't have its disk mounted and validate the disk isn't mounted via lsblk

AME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0      2:0    1    4K  0 disk
sda      8:0    0   16G  0 disk
├─sda1   8:1    0    4M  0 part
├─sda2   8:2    0   10M  0 part /boot/efi
└─sda3   8:3    0   16G  0 part /
sdb      8:16   0    4G  0 disk
└─sdb1   8:17   0    4G  0 part 
sr0     11:0    1 1024M  0 rom

Then run the following to remount that partition. If the mount is associated with a service, it needs to be stopped before mounting the disk to it. In this example the volume is on /var/lib/containerd, so we need to stop the containerd service.

# systemctl stop containerd
# mount -t ext4 /dev/sdb1 /var/lib/containerd

Then run another lsblk to confirm that the partition is mounted and then start the service.

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0      2:0    1    4K  0 disk
sda      8:0    0   16G  0 disk
├─sda1   8:1    0    4M  0 part
├─sda2   8:2    0   10M  0 part /boot/efi
└─sda3   8:3    0   16G  0 part /
sdb      8:16   0    4G  0 disk
└─sdb1   8:17   0    4G  0 part /var/lib/containerd
sr0     11:0    1 1024M  0 rom
# systemctl start containerd

Additional Information

Note: If your volume mount is on the controller nodes under /var/lib/etcd then you may need to repair the etcd database. This is a delicate process that can cause data loss on the cluster and as such, are requesting that anyone running into this issue open a case with VMware support for us to assist in repairing the guest cluster's etcd database.

Feedback

thumb_up Yes

thumb_down No