vmware-system-user account expired on TKGS Guest Cluster nodes
search cancel

vmware-system-user account expired on TKGS Guest Cluster nodes

book

Article ID: 319375

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

This article provides a daemonset that can be applied on Guest Clusters to update the vmware-system-user password expiry, allowing SSH sessions to Guest Cluster nodes if required

Symptoms:

  • Users are unable to connect via SSH directly to TKGS Guest Cluster nodes using vmware-system-user account
  • TKR 1.23.8 vmware-system-user password is set to expire in 60 days as part of STIG Hardening.
  • While this is implemented as part of Security Hardening this impacts the ssh login to the nodes once the password has expired.

 

 

Environment

VMware vSphere 7.0 with Tanzu

Resolution

Change the vmware-system-user password expiry on Existing Clusters using the following daemonset:

 

  1. Create a yaml file called pass_expiry.yaml using the following command, copy from cat <<EOF>> until the EOF line at the bottom:

    # cat <<EOF>> pass_expiry.yaml
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: cluster-admin
    spec:
      selector:
        matchLabels:
          tkgs: cluster-admin
      template:
        metadata:
          labels:
            tkgs: cluster-admin
        spec:
          volumes:
            - name: hostfs
              hostPath:
                path: /
          initContainers:
            - name: init
              image: ubuntu:23.04
              command:
                - /bin/sh
                - -xc
                - |
                  chroot /host chage -l vmware-system-user \
                  && chroot /host chage -m 0 -M -1 vmware-system-user \
                  && echo expiry updated \
                  && chroot /host chage -l vmware-system-user \
                  && echo done
              volumeMounts:
                - name: hostfs
                  mountPath: /host
          containers:
            - name: sleep
              image: localhost:5000/vmware.io/pause:3.6
          tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/control-plane
            operator: Exists
          - effect: NoSchedule
            key: node-role.kubernetes.io/master
            operator: Exists
          - key: CriticalAddonsOnly
            operator: Exists
          - effect: NoExecute
            key: node.alpha.kubernetes.io/notReady
            operator: Exists
          - effect: NoExecute
            key: node.alpha.kubernetes.io/unreachable
            operator: Exists
          - effect: NoSchedule
            key: kubeadmNode
            operator: Equal
            value: master
    EOF


    This will create a yaml file called pass_expiry.yaml which we can apply to the Guest Cluster.


  2. Use the kubectl vsphere login command to log into your Guest Cluster:


    # kubectl vsphere login --insecure-skip-tls-verify --server <SUPERVISOR_VIP> --tanzu-kubernetes-cluster-namespace <GUEST_CLUSTER_NAMESPACE> --tanzu-kubernetes-cluster-name <GUEST_CLUSTER_NAME>


  3. Apply the daemonset yaml created in Step 1:


    # kubectl apply -f pass_expiry.yaml


*Important*:

  • In newer TKC versions supported on vSphere 7.x and 8.x, the DaemonSet fails to configure the vmware-system-user account to never expire on the control plane (master) nodes.

    To resolve this, the toleration for running on the control plane nodes was updated as follows:

- effect: "NoSchedule"
 
key: "node-role.kubernetes.io/control-plane"
  operator: "Exists"

The control plane node is tainted with the node-role.kubernetes.io/control-plane key, rather than being specifically tainted as a master node. All other tolerations are left unchanged.

 
  • If the Kubernetes version is 1.25+ it is necessary to set  right Pod security for the 'default' name space. Otherwise, the daemonset will not be scheduled.
    The describe output of the deamonset will show the Pod security errors.
Events:   Type     Reason        Age   From                  Message   ----     ------        ----  ----                  -------   Warning  FailedCreate  49m   daemonset-controller  Error creating: pods "cluster-admin-bdrkf" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "init", "sleep" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "init", "sleep" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "hostfs" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "init", "sleep" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "init", "sleep" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") 

In those situations, apply the Pod security setting as below for the 'default' namespace and re-apply the deamonset to fix the issue.

kubectl label --overwrite ns default pod-security.kubernetes.io/enforce=privileged