After upgrade of Ubuntu based classy clusters to 22.04 TKGm 2.5.1 the NFS client is no longer available (installed by default).
Example errors when describing the K8s node:
kubectl describe node $NODE
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 54s (x5138 over 7d5h) kubelet MountVolume.SetUp failed for volume "nfs-subdir-external-provisioner-root" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs xxx-yyy-zzz:/feature-internal-calico /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root
Output: mount: /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
NFS client packages were removed on 2.5.1 due to the NFS usage of RPCbind (which is bound on port 111 by default); it was disabled to comply with the Ubuntu CIS Benchmark C-2.3.6.
TKGm does not support external NFS, and so far, there have been no reports of usage. NFS is only supported in the datastore via CSI.
Installing it back will regress a security issue and is out of the scope.
There are two alternatives to toggle the hardening procedures:
1) Bring Your Own Image, change the variables passed (adding these packages), and export a new template that can be used - this is more complex procedure and is not covered with this article
2) Create a custom ClusterClass, to install the package before the node joins the cluster (preKubeadmCommand) - Procedure below
Prerequisites
TKGm management cluster is created (tested with 2.5)
ytt installed
kubectl installed and set to management cluster context
Tanzu CLI
This process involves creating a custom cluster class that allows for deploying worker nodes in a workload cluster with NFS common utils installed. Creating custom clusterclasses is roughly documented here.
NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.
cp ~/.config/tanzu/tkg/clusterclassconfigs/tkg-vsphere-default-v1.2.0.yaml .
mkdir overlays
cd overlays
Return to top folder after creating the files: cd ..
Create a file nfscommon.yaml:
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"ClusterClass"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: tkg-vsphere-default-v1.2.0-extended
spec:
#@overlay/match missing_ok=True
variables:
#@overlay/append
- name: nfsCommon
required: false
schema:
openAPIV3Schema:
type: boolean
default: false
#@overlay/match expects=1
patches:
#@overlay/append
- name: nfs
enabledIf: '{{ .nfsCommon }}'
definitions:
- selector:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
matchResources:
machineDeploymentClass:
names:
- tkg-worker
jsonPatches:
- op: add
path: /spec/template/spec/preKubeadmCommands/-
value: |
sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y && \
sudo apt update -y && \
sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages && \
sudo apt-get install -y nfs-common --allow-change-held-packages
Create a file filter.yaml:
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.not_op(overlay.subset({"kind": "ClusterClass"})),expects="0+"
---
#@overlay/remove
ytt -f tkg-vsphere-default-v1.2.0.yaml -f overlays/filter.yaml > default_cc.yaml
ytt -f default_cc.yaml -f overlays/ > custom_cc.yaml
kubectl apply -f custom_cc.yaml
NAME AGE
tkg-vsphere-default-v1.2.0 21h
tkg-vsphere-default-v1.2.0-extended 20h
In order to create a new cluster with the custom class follow the below steps where the cluster_overlay.yaml is visible below:
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Cluster"})
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
spec:
topology:
class: tkg-vsphere-default-v1.2.0-extended
variables:
- name: nfsCommon
value: true
For existing clusters the procedure is similar where we have to update two fields on existing cluster:
spec:
...
topology:
class: tkg-vsphere-default-v1.2.0-extended
controlPlane:
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
replicas: 1
variables:
- name: nfsCommon
value: true
- name: cni
value: antrea
- name: controlPlaneCertificateRotation
...
The update will trigger immediate update of the worker nodes and will recreate the workers with Nfs-common
Troubleshooting steps in case the worker nodes are recreated but the NFS client is not installed verify with commands below to confirm if the package was installed successfully
Used the below git page as a guide for this KB
In case of emergencies, you can manually install the NFS packages temporarily on the worker node.
NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found. You will need to change yours.
ssh capv@${WORKER_NODE_IPADDRESS}
sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y
sudo apt update -y
sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages
sudo apt-get install -y nfs-common --allow-change-held-packages