Update cluster class to install NFS client on Ubuntu 22.04 and TKGm 2.5.1

Products

VMware Tanzu Kubernetes Grid Plus Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

After upgrade of Ubuntu based classy clusters to 22.04 TKGm 2.5.1 the libnfsidmap1 and nfs-common kernel packages are no longer available (previously, they were enabled by default).

For clusters previously built with the nfs client, this will cause NFS volume mount failures after upgrade.

Example errors when describing the K8s node:

kubectl describe node $NODE Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 54s (x5138 over 7d5h) kubelet MountVolume.SetUp failed for volume "nfs-subdir-external-provisioner-root" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs xxx-yyy-zzz:/feature-internal-calico /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root
Output: mount: /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

Environment

TKGm 2.5.1
TKGm 2.5.x

Cause

NFS client packages were removed on 2.5.1 due to the NFS usage of RPCbind (which is bound on port 111 by default);
- The NFS client packages were disabled, by default, to comply with the Ubuntu CIS Benchmark C-2.3.6.
TKGm does not support external NFS. NFS is only supported in the datastore via CSI.
Installing NFS client packages back will expose a known security vulnerability and is out of the scope of Support.
If operators understand the risks, there are two alternatives to enable NFS client packages and bypass the default security hardening procedures.

1. Bring Your Own Image (BYOI) - Change the variables passed (adding these packages), and export a new template that can be used.
  
  NOTE: This is more complex procedure and is not covered with this article.
2. Create a custom ClusterClass, to install the package before the node joins the cluster (preKubeadmCommand) - Procedure below.

Resolution

The below steps can be applied to an existing cluster or a new cluster.

Both require the creation of a custom ClusterClass.

Prerequisites

TKGm management cluster is created (tested with 2.5)
ytt installed
kubectl installed and set to management cluster context
tanzu CLI

Create Custom ClusterClass

In this section, we will create a custom ClusterClass that allows for deploying worker nodes in a workload cluster with NFS client packages (libnfsidmap1 and nfs-common). This ClusterClass will be used for existing clusters and new clusters. Creating a ClusterClass is roughly documented here.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.

In Tanzu Kubernetes Grid 2.3.0 and later, after you deploy a management cluster, you can find the default ClusterClass manifest in the ~/.config/tanzu/tkg/clusterclassconfigs folder.
- cp ~/.config/tanzu/tkg/clusterclassconfigs/tkg-vsphere-default-v1.2.0.yaml .

To customize your ClusterClass manifest, you create ytt overlay files alongside the manifest.

```
mkdir overlays
cd overlays
```
Return to top folder after creating the files:
```
cd ..
```
Open a new file: nfscommon.yaml with the following content:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"ClusterClass"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: tkg-vsphere-default-v1.2.0-extended
spec:
  #@overlay/match missing_ok=True
  variables:
  #@overlay/append
  - name: nfsCommon
    required: false
    schema:
      openAPIV3Schema:
        type: boolean
        default: false
  #@overlay/match expects=1
  patches:
  #@overlay/append
  - name: nfs
    enabledIf: '{{ .nfsCommon }}'
    definitions:
      - selector:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          matchResources:
            machineDeploymentClass:
              names:
                - tkg-worker
        jsonPatches:
          - op: add
            path: /spec/template/spec/preKubeadmCommands/-
            value: |
              sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y && \
              sudo apt update -y && \
              sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages && \
              sudo apt-get install -y nfs-common --allow-change-held-packages

Create a file filter.yaml with the following content:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.not_op(overlay.subset({"kind": "ClusterClass"})),expects="0+"
---
#@overlay/remove

Use the default ClusterClass manifest from step 1 to generate the base ClusterClass:
- ```
ytt -f tkg-vsphere-default-v1.2.0.yaml -f overlays/filter.yaml > default_cc.yaml
```
Generate the custom ClusterClass this command will apply all files from the folder overlays:
- ```
ytt -f default_cc.yaml -f overlays/ > custom_cc.yaml
```
Verify and Install the custom clusterClass in the Management cluster you should see in the file generated the new name and the apt-get install commands in it:
- ```
kubectl apply -f custom_cc.yaml
```

You should see the following output when you run kubectl get clusterclasses:

NAME                                  AGE
tkg-vsphere-default-v1.2.0            21h
tkg-vsphere-default-v1.2.0-extended   20h

We have now created an "extended" ClusterClass that accepts a new variable: nfsCommon

Create New Cluster

In order to create a new cluster with the custom ClusterClass follow the below steps where the cluster_overlay.yaml is visible below:

#@ load("@ytt:overlay", "overlay")

#@overlay/match by=overlay.subset({"kind":"Cluster"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
spec:
  topology:
    class: tkg-vsphere-default-v1.2.0-extended
    variables:
    - name: nfsCommon
      value: true

Copy the config file to your working director There are multiple ways to complete this step and it is mainly for demo purposes:
- ```
cp ~/.config/tanzu/tkg/clusterconfigs\{config_file}.yaml ./workload-1.yaml
```

Generate the custom workload cluster manifest

tanzu cluster create --file workload-1.yaml --dry-run > default_cluster.yaml

Using the overlay, create the custom manifest:

ytt -f default_cluster.yaml -f cluster_overlay.yaml > custom_cluster.yaml

Deploy

tanzu cluster create -f custom_cluster.yaml

Rebase Existing Cluster(s)

For existing clusters, the procedure is similar: We have to update two fields on existing cluster:

NOTE: Rebasing existing clusters to a custom ClusterClass performs a rollout deployment of existing Nodes. This may impact cluster and workload functionality.

1. Modify the .spec.topology.class on the existing cluster object. Change the value to reference the extended ClusterClass: tkg-vsphere-default-v1.2.0-extended
2. Add Variable nfsCommon in .spec.topology.variables as seen in the example below:

spec:
...
  topology:
    class: tkg-vsphere-default-v1.2.0-extended
    controlPlane:
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
      replicas: 1
    variables:
    - name: nfsCommon
      value: true
    - name: cni
      value: antrea
    - name: controlPlaneCertificateRotation
...

The update will trigger immediate update of the worker nodes and will recreate the workers with NFS client packages.

Troubleshooting

In case the worker nodes are recreated but the NFS client is not installed, verify with commands below to confirm if the package was installed successfully:

```
journalctl | grep nfs-client
```
```
journalctl | grep apt-get
```

Additional Information

In case of emergencies, you can manually install the NFS packages temporarily on the worker node.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.

ssh capv@${WORKER_NODE_IPADDRESS}
sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y
sudo apt update -y
sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages
sudo apt-get install -y nfs-common --allow-change-held-packages