Update cluster class to install NFS client on Ubuntu 22.04 and TKGm 2.5.1

Products

VMware Tanzu Kubernetes Grid Plus Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

After upgrade of Ubuntu based classy clusters to 22.04 TKGm 2.5.1 the NFS client is no longer available (installed by default).

Example errors when describing the K8s node:

kubectl describe node $NODE

Type     Reason       Age                    From     Message
  ----     ------       ----                   ----     -------
  Warning  FailedMount  54s (x5138 over 7d5h)  kubelet  MountVolume.SetUp failed for volume "nfs-subdir-external-provisioner-root" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs xxx-yyy-zzz:/feature-internal-calico /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root
Output: mount: /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

Environment

TKGm 2.5.1
TKGm 2.5.x

Cause

NFS client packages were removed on 2.5.1 due to the NFS usage of RPCbind (which is bound on port 111 by default); it was disabled to comply with the Ubuntu CIS Benchmark C-2.3.6.

TKGm does not support external NFS, and so far, there have been no reports of usage. NFS is only supported in the datastore via CSI.

Installing it back will regress a security issue and is out of the scope.
There are two alternatives to toggle the hardening procedures:

1) Bring Your Own Image, change the variables passed (adding these packages), and export a new template that can be used - this is more complex procedure and is not covered with this article
2) Create a custom ClusterClass, to install the package before the node joins the cluster (preKubeadmCommand) - Procedure below

Resolution

Prerequisites

TKGm management cluster is created (tested with 2.5)
ytt installed
kubectl installed and set to management cluster context
Tanzu CLI

This process involves creating a custom cluster class that allows for deploying worker nodes in a workload cluster with NFS common utils installed. Creating custom clusterclasses is roughly documented here.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.

In Tanzu Kubernetes Grid 2.3.0 and later, after you deploy a management cluster, you can find the default ClusterClass manifest in the ~/.config/tanzu/tkg/clusterclassconfigs folder.
- cp ~/.config/tanzu/tkg/clusterclassconfigs/tkg-vsphere-default-v1.2.0.yaml .

To customize your ClusterClass manifest, you create ytt overlay files alongside the manifest.

mkdir overlays
cd overlays

Return to top folder after creating the files: cd ..
Create a file nfscommon.yaml:
#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"ClusterClass"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: tkg-vsphere-default-v1.2.0-extended
spec:
  #@overlay/match missing_ok=True
  variables:
  #@overlay/append
  - name: nfsCommon
    required: false
    schema:
      openAPIV3Schema:
        type: boolean
        default: false
  #@overlay/match expects=1
  patches:
  #@overlay/append
  - name: nfs
    enabledIf: '{{ .nfsCommon }}'
    definitions:
      - selector:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          matchResources:
            machineDeploymentClass:
              names:
                - tkg-worker
        jsonPatches:
          - op: add
            path: /spec/template/spec/preKubeadmCommands/-
            value: |
              sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y && \
              sudo apt update -y && \
              sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages && \
              sudo apt-get install -y nfs-common --allow-change-held-packages


Create a file filter.yaml:
#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.not_op(overlay.subset({"kind": "ClusterClass"})),expects="0+"
---
#@overlay/remove

Use the default ClusterClass manifest from step 1 to generate the base ClusterClass:
- ytt -f tkg-vsphere-default-v1.2.0.yaml -f overlays/filter.yaml > default_cc.yaml
Generate the custom ClusterClass this command will apply all files from the folder overlays:
- ytt -f default_cc.yaml -f overlays/ > custom_cc.yaml
Verify and Install the custom clusterClass in the Management cluster you should see in the file generated the new name and the apt-get install commands in it:
- kubectl apply -f custom_cc.yaml

You should see the following output when you run kubectl get clusterclasses:

NAME                                  AGE
tkg-vsphere-default-v1.2.0            21h
tkg-vsphere-default-v1.2.0-extended   20h

We have now created an "extended" cluster class that accepts a new variable: nfsCommon

In order to create a new cluster with the custom class follow the below steps where the cluster_overlay.yaml is visible below:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"Cluster"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
spec:
  topology:
    class: tkg-vsphere-default-v1.2.0-extended
    variables:
    - name: nfsCommon
      value: true

Copy the config file to your working director There are multiple ways to complete this step and it is mainly for demo purposes:
- cp ~/.config/tanzu/tkg/clusterconfigs\{config_file}.yaml ./workload-1.yaml
Generate the custom workload cluster manifest
- tanzu cluster create --file workload-1.yaml --dry-run > default_cluster.yaml
Using the overlay, create the custom manifest:
- ytt -f default_cluster.yaml -f cluster_overlay.yaml > custom_cluster.yaml
Deploy
- tanzu cluster create -f custom_cluster.yaml

For existing clusters the procedure is similar where we have to update two fields on existing cluster:

Modify .spec.topology.class - tkg-vsphere-default-v1.2.0-extended
Add Variable nfsCoommon in .spec.topology.variables as seen in the example below

spec:
...
  topology:
    class: tkg-vsphere-default-v1.2.0-extended
    controlPlane:
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
      replicas: 1
    variables:
    - name: nfsCommon
      value: true
    - name: cni
      value: antrea
    - name: controlPlaneCertificateRotation
...

The update will trigger immediate update of the worker nodes and will recreate the workers with Nfs-common

Troubleshooting steps in case the worker nodes are recreated but the NFS client is not installed verify with commands below to confirm if the package was installed successfully

journalctl | grep nfs-client
journalctl | grep apt-get

Used the below git page as a guide for this KB

https://github.com/logankimmel/tkgm-nfs-common/tree/master

Additional Information

In case of emergencies, you can manually install the NFS packages temporarily on the worker node.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.

ssh capv@${WORKER_NODE_IPADDRESS}
sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y
sudo apt update -y
sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages
sudo apt-get install -y nfs-common --allow-change-held-packages