Repairing Bosh Created Persistent Disks of TKGI Worker Nodes

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi) VMware Tanzu Kubernetes Grid Integrated Edition VMware Tanzu Kubernetes Grid Integrated Edition 1.x VMware Tanzu Kubernetes Grid Integrated Edition (Core)

Issue/Introduction

TKGI worker nodes are deployed by bosh and have at least three disks attached to it when no Kubernetes persistent volumes are being used. These three disks are

Default stemcell disk - Mounted as the root partition, usually 3 GB in size
Ephemeral disk - All the logs and bosh jobs data get pushed on VM creation to this disk. These are mounted as /var/vcap/data
Persistent disks - Attached to the VM to store data that needs to be available across VM recreates. These are mounted as /var/vcap/store

The procedure mentioned in this article can be used to recover bosh created persistent disks

Environment

All Versions of VMware Tanzu Kubernetes Grid Integrated Edition

Cause

Persistent disk corruption can occur due to underlying IaaS or filesystem issues.

Resolution

Important Note:

The process is recommended for bosh created persistent disks. If an ephemeral disk is corrupted the recommended recovery method is using bosh recreate or bosh cck
As the disk corruption has occurred due to factors that TKGI platform can not control there is a possibility that only partial data is recovered

Identify the node to run fsck

In this example node with IP - 10.20.0.5

kubectl get nodes -o wide

NAME                                   STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
011704a1-5f0f-4cb9-bd91-f9ad7aec17e5   Ready    <none>   20h   v1.23.7+vmware.1   10.20.0.5     10.20.0.5     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4
8334e164-8e9b-4ffb-9c89-bfe015e094a8   Ready    <none>   20h   v1.23.7+vmware.1   10.20.0.4     10.20.0.4     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4
c649ec99-bb3a-4049-9c57-1751f6de271e   Ready    <none>   21h   v1.23.7+vmware.1   10.20.0.3     10.20.0.3     Ubuntu 16.04.7 LTS   4.15.0-191-generic   containerd://1.6.4

Identify the bosh VM corresponding to that node

bosh vms -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167 | grep 10.20.0.5

worker/fcd09dc3-9e7a-4528-8015-22620b553f27 running az 10.20.0.5 vm-c2b8073f-949d-4891-b420-36769ecdee60 medium.disk true bosh-vsphere-esxi-ubuntu-xenial-go_agent/621.265

Drain the node

Note: Other drain options maybe needed if drain fails
kubectl drain 011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 --ignore-daemonsets

node/011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 cordoned
WARNING: ignoring DaemonSet-managed Pods: pks-system/fluent-bit-7rg24, pks-system/telegraf-xjsx4
evicting pod kube-system/coredns-67bd78c556-9vwfd
pod/coredns-67bd78c556-9vwfd evicted
node/011704a1-5f0f-4cb9-bd91-f9ad7aec17e5 drained

Make sure scheduling is disabled

kubectl get nodes

NAME                                   STATUS                     ROLES    AGE   VERSION
011704a1-5f0f-4cb9-bd91-f9ad7aec17e5   Ready,SchedulingDisabled   <none>   20h   v1.23.7+vmware.1
8334e164-8e9b-4ffb-9c89-bfe015e094a8   Ready                      <none>   20h   v1.23.7+vmware.1
c649ec99-bb3a-4049-9c57-1751f6de271e   Ready                      <none>   21h   v1.23.7+vmware.1

Stop all monit processes

Turn off cck/resurrection

bosh update-resurrection off -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167

SSH to worker node and stop processes

bosh -d service-instance_77e44aad-1a76-4980-8d4e-43d7c273d167 ssh worker/fcd09dc3-9e7a-4528-8015-22620b553f27

sudo su -
monit stop all

To confirm everything stopped
monit summary

Identify the mount point

Bosh persistent disks are mounted as /var/vcap/store, before repairing identify the filesystem path
From the output below /var/vcap/store is leveraging /dev/sdc1

df -h

Filesystem      Size  Used Avail Use% Mounted on
<------ Truncated Output ------>
/dev/sda1       2.9G  1.4G  1.4G  52% /
/dev/sdb1        32G  3.5G   27G  12% /var/vcap/data
tmpfs            16M  4.0K   16M   1% /var/vcap/data/sys/run
/dev/sdc1        50G  2.1G   45G   5% /var/vcap/store
<------ Truncated Output ------>

Unmount

Before running repair the directory needs to be unmounted - umount /var/vcap/store
If umount fails because device is busy, identify which processes have blocked the operation using
- fuser -m -u -v /dev/sdc1 or
- fuser -m -u -v /var/vcap/store
These services will need to be stopped and processes that are accessing this will need to be terminated
- kill <PID>

Run fsck

fsck /dev/sdc1

fsck from util-linux 2.27.1
e2fsck 1.42.13 (17-May-2015)
/dev/sdc1: clean, 12599/3276800 files, 794069/13106688 blocks

Remount Disk

mount /dev/sdc1 /var/vcap/store

mount | grep sdc
/dev/sdc1 on /var/vcap/store type ext4 (rw,relatime,data=ordered)

Start all the processes

As part of process stop and start, kubelet has also restarted which should bring the nodes out of SchedulingDisabled state

monit start all