vmware-system-user account expired on vSphere Supervisor Workload Cluster Nodes

Products

VMware vSphere Kubernetes Service VMware vSphere 7.0 with Tanzu vSphere with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

This article provides a daemonset that can be applied on vSphere Supervisor Workload clusters to update the vmware-system-user password expiry, allowing SSH sessions to workload cluster nodes if required

Symptoms:

Users are unable to connect via SSH directly to the workload cluster nodes using vmware-system-user account
TKR 1.23.8 and higher vmware-system-user password is set to expire in 60 days as part of STIG Hardening.
While this is implemented as part of Security Hardening this impacts the ssh login to the nodes once the password has expired.

Environment

VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu

Workload Cluster Running on TKR 1.23.8 and higher

Cause

TKR 1.23.8 and higher vmware-system-user password is set to expire in 60 days as part of STIG Hardening.

Resolution

Change the vmware-system-user password expiry on Existing Clusters using the following daemonset workaround:

Create a yaml file called pass_expiry.yaml with the following contents:

vi pass_expiry.yaml

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cluster-admin
spec:
  selector:
    matchLabels:
      tkgs: cluster-admin
  template:
    metadata:
      labels:
        tkgs: cluster-admin
    spec:
      volumes:
        - name: hostfs
          hostPath:
            path: /
      initContainers:
        - name: init
          image: ubuntu:23.04
          command:
            - /bin/sh
            - -xc
            - |
              chroot /host chage -l vmware-system-user \
              && chroot /host chage -m 0 -M -1 vmware-system-user \
              && echo expiry updated \
              && chroot /host chage -l vmware-system-user \
              && echo done
          volumeMounts:
            - name: hostfs
              mountPath: /host
      containers:
        - name: sleep
          image: localhost:5000/vmware.io/pause:3.6
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        key: node.alpha.kubernetes.io/notReady
        operator: Exists
      - effect: NoExecute
        key: node.alpha.kubernetes.io/unreachable
        operator: Exists
      - effect: NoSchedule
        key: kubeadmNode
        operator: Equal
        value: master

This will create a yaml file called pass_expiry.yaml which we can apply to the workload cluster, also known as a Guest Cluster.
As per the YAML file above, this daemonset creates a pod on each node in the workload cluster and runs a few commands to periodically set the vmware-system-user to no longer be expired and to never expire.
This daemonset and its pods will persist through workload cluster upgrades to prevent vmware-system-user from expiring but may require a pause image version change.
vmware-system-user is VMware by Broadcom Support's breakglass user for troubleshooting workload clusters.

Use the kubectl vsphere login command to log into your workload cluster as per either of the following documentation:
- Connect to a TKG Service Cluster as a vCenter Single Sign-On User with Kubectl
- Connect to a VKS Cluster with VCF CLI
Apply the daemonset yaml created in Step 1:
```
kubectl apply -f pass_expiry.yaml
```
Confirm that the daemonset shows a total count of Ready daemonsets equivalent to the total number of nodes in the environment:
- Note: The cluster-admin daemonset is often created in the default namespace
```
kubectl get ds cluster-admin -n <namespace>
```
If the total Ready count of the daemonset does not match the total number of nodes, describe the cluster-admin pod for more information:
```
kubectl get pods -n <namespace> | grep cluster-admin

kubectl describe pod <cluster-admin-pod> -n <namespace>
```

For NAPP Clusters:

SSH into one of the NSX Managers to connect to the NAPP cluster
Create the pass_expiry.yaml file with the contents above:
```
vi pass_expiry.yaml
```
Apply the daemonset:
```
napp-k apply -f pass_expiry.yaml
```
Confirm that the daemonset shows a total count of Ready daemonsets equivalent to the total number of nodes in the environment:
- Note: The cluster-admin daemonset is often created in the default namespace
```
napp-k get ds cluster-admin
```
If the total Ready count of the daemonset does not match the total number of nodes, describe the cluster-admin pod for more information:
```
napp-k get pods | grep cluster-admin

napp-k describe pod <cluster-admin-pod>
```

Troubleshooting the cluster-admin Daemonset Workaround

Back-off Pulling Image ubuntu:23.04

If the Ubuntu container image pull fails due to Docker Hub pull rate limit, you may try other public image URLs like:

image: public.ecr.aws/ubuntu/ubuntu:23.04

or

image: mirror.gcr.io/ubuntu:23.04

Back-off Pulling Image localhost:5000/vmware/pause:3.6

This error indicates that the cluster has a different pause image version than specified in the above cluster-admin daemonset YAML.

Pause image versions can change between TKR version and will need to be updated accordingly.

Connect into the Supervisor cluster context through one of the below methods:
- Connect to the Supervisor as a vCenter Single Sign-On User with kubectl
- Connect to the Supervisor as a vCenter Single Sign-On User through VCF CLI

Fetch the pause image version for the current TKR version of the affected workload cluster and note down the pause image version for imageTag:

kubectl get cluster -o yaml -n <WORKLOAD CLUSTER NAMESPACE> <WORKLOAD CLUSTER NAME> | grep pause -A3

pause:
              imageRepository: localhost:5000/vmware.io
              imageTag: "#.#"
            version: <TKR VERSION>

Connect to the affected workload cluster's context using either of the following documentation:
- Connect to a TKG Service Cluster as a vCenter Single Sign-On User with Kubectl
- Connect to a VKS Cluster with VCF CLI

Locate the cluster-admin daemonset:

kubectl get daemonset -A | grep cluster-admin

Edit the cluster-admin daemonset where pause:#.# needs to match the pause imageTag version found in Step 2:
- Note: The cluster-admin is often created in the default namespace.
```
kubectl edit ds cluster-admin -n <namespace>

containers:
  - name: sleep
    image: localhost:5000/vmware.io/pause:<imageTag version from Step 2>
```

Confirm that the cluster-admin pod runs successfully on all nodes in the guest cluster:

kubectl get ds -A | grep cluster-admin

kubectl get pods -A -o wide | grep cluster-admin

PodSecurity Errors

In newer TKC versions supported on vSphere 7.x and 8.x, the DaemonSet fails to configure the vmware-system-user account to never expire on the control plane (master) nodes.

To resolve this, the toleration for running on the control plane nodes was updated as follows:

- effect: "NoSchedule"
  key: "node-role.kubernetes.io/control-plane"
  operator: "Exists"

The control plane node is tainted with the node-role.kubernetes.io/control-plane key, rather than being specifically tainted as a master node. All other tolerations are left unchanged.

If the Kubernetes version is 1.25+ it is necessary to set right pod security for the namespace that the cluster-admin daemonset was created in.
Otherwise, the daemonset will not be scheduled.
The describe output of the daemonset will show the following pod security errors:

Events:   Type     Reason        Age   From                  Message   ----     ------        ----  ----                  -------   Warning  FailedCreate  ##m   daemonset-controller  Error creating: pods "<cluster-admin-podname>" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "init", "sleep" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "init", "sleep" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "hostfs" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "init", "sleep" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "init", "sleep" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

In those situations, as a temporary workaround, the pod security setting as below for the namespace that the cluster-admin daemonset was created and re-apply the daemonset to fix the issue.

Note: The below command assumes that the cluster-admin daemonset was created in the default namespace and this namespace value should be replaced appropriately.

kubectl label --overwrite ns default pod-security.kubernetes.io/enforce=privileged

Please see the following documentation for more details: Configure PSA for TKR v1.25 and Later

Additional Information

VKS 3.4 (builtin-generic v3.4.0 cluster) now supports automatic password rotation before expiry.