Update cluster class to install NFS client on Ubuntu 22.04 and TKGm 2.5.1
search cancel

Update cluster class to install NFS client on Ubuntu 22.04 and TKGm 2.5.1

book

Article ID: 376737

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Plus Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

  • After upgrade of Ubuntu based classy clusters to 22.04 TKGm 2.5.1 the libnfsidmap1 and nfs-common kernel packages are no longer available (previously, they were enabled by default).

  • For clusters previously built with the nfs client, this will cause NFS volume mount failures after upgrade.

  • Example errors when describing the K8s node:

kubectl describe node $NODE

Type     Reason       Age                    From     Message

  ----     ------       ----                   ----     -------
  Warning  FailedMount  54s (x5138 over 7d5h)  kubelet  MountVolume.SetUp failed for volume "nfs-subdir-external-provisioner-root" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs xxx-yyy-zzz:/feature-internal-calico /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root
Output: mount: /var/lib/kubelet/pods/c4e0138c-de9a-4c6f-b396-29333cc9b460/volumes/kubernetes.io~nfs/nfs-subdir-external-provisioner-root: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

Environment

  • TKGm 2.5.1
  • TKGm 2.5.x

Cause

  • NFS client packages were removed on 2.5.1 due to the NFS usage of RPCbind (which is bound on port 111 by default);

    • The NFS client packages were disabled, by default, to comply with the Ubuntu CIS Benchmark C-2.3.6.

  • TKGm does not support external NFS. NFS is only supported in the datastore via CSI.

  • Installing NFS client packages back will expose a known security vulnerability and is out of the scope of Support.

  • If operators understand the risks, there are two alternatives to enable NFS client packages and bypass the default security hardening procedures.
    1. Bring Your Own Image (BYOI) - Change the variables passed (adding these packages), and export a new template that can be used.

              NOTE: This is more complex procedure and is not covered with this article.

    2. Create a custom ClusterClass, to install the package before the node joins the cluster (preKubeadmCommand) - Procedure below.

Resolution

The below steps can be applied to an existing cluster or a new cluster.  

Both require the creation of a custom ClusterClass.

 

Prerequisites

  • TKGm management cluster is created (tested with 2.5)

  • ytt installed

  • kubectl installed and set to management cluster context

  • tanzu CLI

 

Create Custom ClusterClass

In this section, we will create a custom ClusterClass that allows for deploying worker nodes in a workload cluster with NFS client packages (libnfsidmap1 and nfs-common).  This ClusterClass will be used for existing clusters and new clusters.  Creating a ClusterClass is roughly documented here.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found.  You will need to change yours.

 

    1. In Tanzu Kubernetes Grid 2.3.0 and later, after you deploy a management cluster, you can find the default ClusterClass manifest in the ~/.config/tanzu/tkg/clusterclassconfigs folder.

      • cp ~/.config/tanzu/tkg/clusterclassconfigs/tkg-vsphere-default-v1.2.0.yaml .

    2. To customize your ClusterClass manifest, you create ytt overlay files alongside the manifest.

      • mkdir overlays
        cd overlays

         

      • Return to top folder after creating the files:

        cd ..
      • Open a new file: nfscommon.yaml with the following content:

      • #@ load("@ytt:overlay", "overlay")
        
        #@overlay/match by=overlay.subset({"kind":"ClusterClass"})
        ---
        apiVersion: cluster.x-k8s.io/v1beta1
        kind: ClusterClass
        metadata:
          name: tkg-vsphere-default-v1.2.0-extended
        spec:
          #@overlay/match missing_ok=True
          variables:
          #@overlay/append
          - name: nfsCommon
            required: false
            schema:
              openAPIV3Schema:
                type: boolean
                default: false
          #@overlay/match expects=1
          patches:
          #@overlay/append
          - name: nfs
            enabledIf: '{{ .nfsCommon }}'
            definitions:
              - selector:
                  apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
                  kind: KubeadmConfigTemplate
                  matchResources:
                    machineDeploymentClass:
                      names:
                        - tkg-worker
                jsonPatches:
                  - op: add
                    path: /spec/template/spec/preKubeadmCommands/-
                    value: |
                      sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y && \
                      sudo apt update -y && \
                      sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages && \
                      sudo apt-get install -y nfs-common --allow-change-held-packages
        
      • Create a file filter.yaml with the following content:

        #@ load("@ytt:overlay", "overlay")
        
        #@overlay/match by=overlay.not_op(overlay.subset({"kind": "ClusterClass"})),expects="0+"
        ---
        #@overlay/remove
    3. Use the default ClusterClass manifest from step 1 to generate the base ClusterClass:
      • ytt -f tkg-vsphere-default-v1.2.0.yaml -f overlays/filter.yaml > default_cc.yaml
    4. Generate the custom ClusterClass this command will apply all files from the folder overlays:
      • ytt -f default_cc.yaml -f overlays/ > custom_cc.yaml
    5. Verify and Install the custom clusterClass in the Management cluster you should see in the file generated the new name and the apt-get install commands in it:
      • kubectl apply -f custom_cc.yaml
    6. You should see the following output when you run kubectl get clusterclasses:
      • NAME                                  AGE
        tkg-vsphere-default-v1.2.0            21h
        tkg-vsphere-default-v1.2.0-extended   20h
    7. We have now created an "extended" ClusterClass that accepts a new variable: nfsCommon


Create New Cluster

In order to create a new cluster with the custom ClusterClass follow the below steps where the cluster_overlay.yaml is visible below:

    • #@ load("@ytt:overlay", "overlay")
      
      #@overlay/match by=overlay.subset({"kind":"Cluster"})
      ---
      apiVersion: cluster.x-k8s.io/v1beta1
      kind: Cluster
      spec:
        topology:
          class: tkg-vsphere-default-v1.2.0-extended
          variables:
          - name: nfsCommon
            value: true

 

  • Copy the config file to your working director There are multiple ways to complete this step and it is mainly for demo purposes:
    • cp ~/.config/tanzu/tkg/clusterconfigs\{config_file}.yaml ./workload-1.yaml
  • Generate the custom workload cluster manifest
    • tanzu cluster create --file workload-1.yaml --dry-run > default_cluster.yaml
  • Using the overlay, create the custom manifest:
    • ytt -f default_cluster.yaml -f cluster_overlay.yaml > custom_cluster.yaml
  • Deploy
    • tanzu cluster create -f custom_cluster.yaml

 

 

Rebase Existing Cluster(s)

For existing clusters, the procedure is similar: We have to update two fields on existing cluster:

NOTE: Rebasing existing clusters to a custom ClusterClass performs a rollout deployment of existing Nodes. This may impact cluster and workload functionality.

    1. Modify the .spec.topology.class on the existing cluster object.  Change the value to reference the extended ClusterClass: tkg-vsphere-default-v1.2.0-extended

    2. Add Variable nfsCommon in .spec.topology.variables as seen in the example below:

      • spec:
        ...
          topology:
            class: tkg-vsphere-default-v1.2.0-extended
            controlPlane:
              metadata:
                annotations:
                  run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
              replicas: 1
            variables:
            - name: nfsCommon
              value: true
            - name: cni
              value: antrea
            - name: controlPlaneCertificateRotation
        ...

 

The update will trigger immediate update of the worker nodes and will recreate the workers with NFS client packages.

 

 

Troubleshooting

In case the worker nodes are recreated but the NFS client is not installed, verify with commands below to confirm if the package was installed successfully:

      • journalctl | grep nfs-client
      • journalctl | grep apt-get

Additional Information

In case of emergencies, you can manually install the NFS packages temporarily on the worker node.

NOTE: In this example, we point to an APT Repository my.repo.com where we know the Ubuntu resources can be found.  You will need to change yours.

ssh capv@${WORKER_NODE_IPADDRESS}
sudo add-apt-repository -s https://my.repo.com/ubuntu/ jammy main [my.repo.com] -y
sudo apt update -y
sudo apt-get install -y libnfsidmap1=1:2.6.1-1ubuntu1 --allow-downgrades --allow-change-held-packages
sudo apt-get install -y nfs-common --allow-change-held-packages