When attempting to SSH into guest cluster nodes using the vmware-system-user account and the password retrieved from the <Cluster>-ssh-password secret, the connection fails with the following error:
"Permission denied, please try again"
vmware-system-user@<node-name>:~$ sudo -iroot@<node-name>:~# chage -l vmware-system-user
(Output shows "Password expires: never", "Account expires: never")
RenewalDaysBeforeExpiry in the cluster configuration is set to the default of 7 days.# kubectl get cluster -n <namespace> <cluster-name> -o yaml - name: osConfiguration value: ntp: servers: - #### user: password: renewalDaysBeforeExpiry: 7
update-node-password-runner pods are running and stable, with no restarts observed when running:# kubectl get pods -A | grep update-node-password-runner
vSphere with Tanzu
vSphere Kubernetes Service (VKS) / Tanzu Kubernetes Grid (TKG)
This issue occurs when the cluster node's actual password becomes out of sync with the password stored in the Kubernetes secret.
The password rotation mechanism operates as a two-phase update:
Phase 1: Rotate the password within the Kubernetes secret.
Phase 2: Create the DaemonSets (update-node-password-runner) to push the updated password down to the guest cluster nodes.
The cluster specification only records the lastUpdateTimestamp once Phase 2 is successful. An out-of-sync state happens if the cluster specification (specifically osConfiguration.password.renewalDaysBeforeExpiry) is modified right after Phase 1, but before Phase 2 completes.
When this interruption happens, the DaemonSet fails to populate the new password to the actual cluster node.
As a result, you cannot log in using the password from the secret because the node is still expecting the old password.
To resolve this issue, you must clear the password update timestamp annotations from the cluster and associated secrets, and then restart the TKG controller manager.
Note: Replace <namespace> and <cluster-name> with your actual Supervisor namespace and Tanzu Kubernetes Cluster name.
Step 1: Remove the timestamp annotations
Run the following commands against the Supervisor cluster to remove the password-update-last-timestamp annotations:
kubectl annotate secret -n <namespace> <cluster-name>-ssh-password kubernetes.vmware.com/password-update-last-timestamp-kubectl annotate secret -n <namespace> <cluster-name>-ssh-password-hashed kubernetes.vmware.com/password-update-last-timestamp-kubectl annotate cluster -n <namespace> <cluster-name> kubernetes.vmware.com/password-update-last-timestamp-(Note: The trailing - at the end of the annotation key is required to remove the annotation).
Step 2: Restart the TKG Controller Manager
Rollout a restart of the vmware-system-tkg-controller-manager deployment to force a reconciliation:
kubectl rollout restart deployment -n <VKS_namespace> vmware-system-tkg-controller-manager
- For related issues involving password runner pods, please refer to the following KB article:
- Below is an example run of the provided resolution -
1. Before removing the annotation.
# kubectl get secret -n <namespace> <cluster-name>-ssh-password -o yamlapiVersion: v1data: ssh-passwordkey: ######kind: Secretmetadata: annotations: kubernetes.vmware.com/password-update-last-timestamp: "YYYY-MM-DDTHH:MM:SSZ" creationTimestamp: "YYYY-MM-DDTHH:MM:SSZ"
2. Command to remove the annotation.
# kubectl annotate secret -n <namespace> <cluster-name>-ssh-password kubernetes.vmware.com/password-update-last-timestamp-secret/<cluster-name>-ssh-password annotated
3. After removing the annotation.
# kubectl get secret -n <namespace> <cluster-name>-ssh-password -o yaml
apiVersion: v1data: ssh-passwordkey: #######kind: Secretmetadata: creationTimestamp: "YYYY-MM-DDTHH:MM:SSZ"