SSH to guest cluster nodes using vmware-system-user and password fails with error "Permission denied, please try again".
search cancel

SSH to guest cluster nodes using vmware-system-user and password fails with error "Permission denied, please try again".

book

Article ID: 432530

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

When attempting to SSH into guest cluster nodes using the vmware-system-user account and the password retrieved from the <Cluster>-ssh-password secret, the connection fails with the following error:

"Permission denied, please try again"

  • However, logging into the node using vmware-system-user with the SSH key is successful.
  • Checking the account status directly on the node confirms the password and account are valid and not expired:

vmware-system-user@<node-name>:~$ sudo -i
root@<node-name>:~# chage -l vmware-system-user

(Output shows "Password expires: never", "Account expires: never")

  • The RenewalDaysBeforeExpiry in the cluster configuration is set to the default of 7 days.

# kubectl get cluster -n <namespace> <cluster-name> -o yaml
    - name: osConfiguration
     value:
       ntp:
         servers:
         - ####
       user:
         password:
           renewalDaysBeforeExpiry: 7

  • The update-node-password-runner pods are running and stable, with no restarts observed when running:

# kubectl get pods -A | grep update-node-password-runner

Environment

vSphere with Tanzu

vSphere Kubernetes Service (VKS) / Tanzu Kubernetes Grid (TKG)

Cause

This issue occurs when the cluster node's actual password becomes out of sync with the password stored in the Kubernetes secret.

The password rotation mechanism operates as a two-phase update:

  1. Phase 1: Rotate the password within the Kubernetes secret.

  2. Phase 2: Create the DaemonSets (update-node-password-runner) to push the updated password down to the guest cluster nodes.

The cluster specification only records the lastUpdateTimestamp once Phase 2 is successful. An out-of-sync state happens if the cluster specification (specifically osConfiguration.password.renewalDaysBeforeExpiry) is modified right after Phase 1, but before Phase 2 completes.

When this interruption happens, the DaemonSet fails to populate the new password to the actual cluster node.

As a result, you cannot log in using the password from the secret because the node is still expecting the old password. 

Resolution

To resolve this issue, you must clear the password update timestamp annotations from the cluster and associated secrets, and then restart the TKG controller manager.

Note: Replace <namespace> and <cluster-name> with your actual Supervisor namespace and Tanzu Kubernetes Cluster name.

Step 1: Remove the timestamp annotations

Run the following commands against the Supervisor cluster to remove the password-update-last-timestamp annotations:

  1. Remove annotation from the SSH password secret
    kubectl annotate secret -n <namespace> <cluster-name>-ssh-password kubernetes.vmware.com/password-update-last-timestamp-

  2. Remove annotation from the hashed SSH password secret
    kubectl annotate secret -n <namespace> <cluster-name>-ssh-password-hashed kubernetes.vmware.com/password-update-last-timestamp-

  3. Remove annotation from the cluster object
    kubectl annotate cluster -n <namespace> <cluster-name> kubernetes.vmware.com/password-update-last-timestamp-

(Note: The trailing - at the end of the annotation key is required to remove the annotation).

Step 2: Restart the TKG Controller Manager

Rollout a restart of the vmware-system-tkg-controller-manager deployment to force a reconciliation:

kubectl rollout restart deployment -n <VKS_namespace> vmware-system-tkg-controller-manager

Additional Information

- For related issues involving password runner pods, please refer to the following KB article:

Continuous reconciliation of the daemonset "update-node-password-runner" due to missing VM Class in the cluster namespace

 

- Below is an example run of the provided resolution - 

1. Before removing the annotation.

# kubectl get secret -n <namespace> <cluster-name>-ssh-password -o yaml
apiVersion: v1
data:
 ssh-passwordkey: ######
kind: Secret
metadata:
 annotations:
   kubernetes.vmware.com/password-update-last-timestamp: "YYYY-MM-DDTHH:MM:SSZ"
 creationTimestamp: "YYYY-MM-DDTHH:MM:SSZ"

2. Command to remove the annotation.

# kubectl annotate secret -n <namespace> <cluster-name>-ssh-password kubernetes.vmware.com/password-update-last-timestamp-
secret/<cluster-name>-ssh-password annotated

3. After removing the annotation.

# kubectl get secret -n <namespace> <cluster-name>-ssh-password -o yaml                    

apiVersion: v1
data:
 ssh-passwordkey: #######
kind: Secret
metadata:
 creationTimestamp: "YYYY-MM-DDTHH:MM:SSZ"