This article provides a daemonset that can be applied on vSphere Supervisor Workload clusters to update the vmware-system-user password expiry, allowing SSH sessions to workload cluster nodes if required
Symptoms:
VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu
Workload Cluster Running on TKR 1.23.8 and higher
TKR 1.23.8 and higher vmware-system-user password is set to expire in 60 days as part of STIG Hardening.
Change the vmware-system-user password expiry on Existing Clusters using the following daemonset workaround:
vi pass_expiry.yaml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cluster-admin
spec:
selector:
matchLabels:
tkgs: cluster-admin
template:
metadata:
labels:
tkgs: cluster-admin
spec:
volumes:
- name: hostfs
hostPath:
path: /
initContainers:
- name: init
image: ubuntu:23.04
command:
- /bin/sh
- -xc
- |
chroot /host chage -l vmware-system-user \
&& chroot /host chage -m 0 -M -1 vmware-system-user \
&& echo expiry updated \
&& chroot /host chage -l vmware-system-user \
&& echo done
volumeMounts:
- name: hostfs
mountPath: /host
containers:
- name: sleep
image: localhost:5000/vmware.io/pause:3.6
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
key: node.alpha.kubernetes.io/notReady
operator: Exists
- effect: NoExecute
key: node.alpha.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: kubeadmNode
operator: Equal
value: master
kubectl apply -f pass_expiry.yaml
kubectl get ds cluster-admin -n <namespace>
kubectl get pods -n <namespace> | grep cluster-admin
kubectl describe pod <cluster-admin-pod> -n <namespace>
vi pass_expiry.yaml
napp-k apply -f pass_expiry.yaml
napp-k get ds cluster-admin
napp-k get pods | grep cluster-admin
napp-k describe pod <cluster-admin-pod>
This error indicates that the cluster has a different pause image version than specified in the above cluster-admin daemonset YAML.
Pause image versions can change between TKR version and will need to be updated accordingly.
kubectl get cluster -o yaml -n <WORKLOAD CLUSTER NAMESPACE> <WORKLOAD CLUSTER NAME> | grep pause -A3
pause:
imageRepository: localhost:5000/vmware.io
imageTag: "#.#"
version: <TKR VERSION>
kubectl get daemonset -A | grep cluster-admin
kubectl edit ds cluster-admin -n <namespace>
containers:
- name: sleep
image: localhost:5000/vmware.io/pause:<imageTag version from Step 2>
kubectl get ds -A | grep cluster-admin
kubectl get pods -A -o wide | grep cluster-admin
vmware-system-user account to never expire on the control plane (master) nodes.- effect: "NoSchedule"
key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
The control plane node is tainted with the node-role.kubernetes.io/control-plane key, rather than being specifically tainted as a master node. All other tolerations are left unchanged.
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate ##m daemonset-controller Error creating: pods "<cluster-admin-podname>" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (containers "init", "sleep" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "init", "sleep" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volume "hostfs" uses restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "init", "sleep" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "init", "sleep" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
In those situations, as a temporary workaround, the pod security setting as below for the namespace that the cluster-admin daemonset was created and re-apply the daemonset to fix the issue.
Note: The below command assumes that the cluster-admin daemonset was created in the default namespace and this namespace value should be replaced appropriately.
kubectl label --overwrite ns default pod-security.kubernetes.io/enforce=privileged
Please see the following documentation for more details: Configure PSA for TKR v1.25 and Later