CSI: Correct sync between CSI pod secret and workload_storage_management user password in vSphere with Tanzu
search cancel

CSI: Correct sync between CSI pod secret and workload_storage_management user password in vSphere with Tanzu

book

Article ID: 345473

calendar_today

Updated On: 10-27-2024

Products

VMware VMware vSphere ESXi VMware vSphere with Tanzu vSphere with Tanzu VMware Tanzu Kubernetes Grid Service (TKGs) VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

Symptoms:
NOTE: VMware engineering teams are currently working to RCA this issue. If you run into this issue, please open a case with VMware support and include a Workload Management Log bundle. 

TKGS Guest Clusters or Supervisor Clusters fail to create or attach PersistentVolumes or PersistentVolumeClaims to new nodes or pods.

The CSI pods on the TKGS Guest Clusters pass requests for Provisioning, Attaching and Synching operations to the CSI pod on the Supervisor Cluster nodes. The Supervisor Cluster CSI pods authenticate to the CNS (running on vCenter) using a Solution User named workload_storage_management-<VC_MACHINE_ID>@vsphere.local managed by vCenter SSO. Once authentication is verified, the Supervisor VM's pass their CSI operations through to VC for action There might be a need to .

When reviewing CSI logging on the Guest Cluster and Supervisor Cluster, you see errors similar to:

When listing the CSI pod on the Supervisor Cluster or Guest Cluster, you see it in a CrashLoopBackOff state with numerous restarts flagged and READY state showing fewer than 6/6:

 
      kubectl get pods -A | egrep "NAME|csi"
      NAMESPACE               NAME                               READY   STATUS               RESTARTS   AGE

      vmware-system-csi     vsphere-csi-controller-<ID>     5/6     CrashLoopBackOff   6294      103d


vsphere-csi-controller log: Found in /var/log/pods/vmware-system-csi_vsphere-csi-controller-<ID>/vsphere-csi-controller/#.log
            

failed to create govmomi client with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.

failed to connect to VirtualCenter host: \"vcsa-01.fqdn.com\", Err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.


vsphere-syncer log: Found in /var/log/pods/vmware-system-csi_vsphere-csi-controller-<ID>/vsphere-syncer/#.log
 

    failed to create govmomi client with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.

    Cannot connect to vCenter with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.

 
vmdird-syslog.log: Found in vCenter server log /var/log/VMware/vmdird/vmdird-syslog.log
 
        2024-08-12T11:50:17.780719+05:00 warning vmdird  t@140178253403712: Lockout policy check - account lockout. (cn=workload_storage_management-46daxxxx-318c-4096-8f34-afxxxxxx1,cn=serviceprincipals,dc=vsphere,dc=locall)
        2024-08-12T11:50:17.780767+05:00 err vmdird  t@140178253403712: VdirPasswordFailEvent from user(cn=workload_storage_management-46daxxxx-318c-4096-8f34-afxxxxxx1,cn=serviceprincipals,dc=vsphere,dc=local), error(0)()
        2024-08-12T11:50:17.780802+05:00 err vmdird  t@140178253403712: VmDirSendLdapResult: Request (Bind), Error (LDAP_INVALID_CREDENTIALS(49)), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
        2024-08-12T11:50:17.780832+05:00 err vmdird  t@140178253403712: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "CN=workload_storage_management-46daxxxx-318c-4096-8f34-afxxxxxx1,cn=ServicePrincipals,dc=vsphere,dc=local", Method: SASL
        2024-08-12T11:51:04.847039+05:00 err vmdird  t@140178253403712: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)
        2024-08-12T11:51:04.849551+05:00 warning vmdird  t@140178253403712: Lockout policy check - account lockout. (cn=workload_storage_management-46daxxxx-318c-4096-8f34-afxxxxxx1,cn=serviceprincipals,dc=vsphere,dc=local)
 
 
 

Environment

VMware vSphere with Tanzu

Cause

The password sync for workload_storage_management user between the Supervisor Cluster and the vCenter is the root cause of this failure. It can present due to network connectivity failures between the SV nodes and the vCenter. The presentation of this issue can appear in multiple ways:
 
1. CSI secret on the Supervisor Cluster does not match the password in the vCenter, denoted here: /etc/vmware/wcp/.storageUser

2. CSI secret on the Supervisor Cluster matches the password in vCenter, but the CSI pod has not been restarted to utilize the correct password

3. The CSI user in vCenter is locked out due to incorrect login attempts. The passwords may be the same in this condition, but the locked account will prevent logins and will report "incorrect username or password" errors.

Resolution

Workaround:
CAUTION: The below steps should be performed with a VMware Support Engineer.  


Scope to determine if CSI password on vCenter matches CSI password in Supervisor Cluster Secret:
 

1. Check password managed by vsphere-config-secret:
 
  • # kubectl get secrets vsphere-config-secret -n vmware-system-csi -o jsonpath='{.data.vsphere-cloud-provider\.conf}' | base64 -d

2. Compare the above password with the second line in the below file on vCenter:
 
  • # cat /etc/vmware/wcp/.storageUser



If CSI Secret on Supervisor Cluster matches the password noted in /etc/vmware/wcp/.storageUser:

 
1. If the passwords match, try logging into vCenter GUI with the wcp_storage_management user and password noted in the vsphere-config-secret. (Note: If using a different sso domain than vsphere.local, please modify the below command for the vCenter's specific SSO domain.)
 
  • Example username for this workflow is noted in bold, this should be replaced by local environment username: workload_storage_management-2927599b-1e8a-453c-a5d2-3871cbda9671@vsphere.local

2. If the passwords match and user can log into VC with the service, delete the CSI pod from Supervisor Cluster to ensure it instantiates with the secret just tested with VC login:
 
  • # kubectl delete pod <vmware-csi-controller-id> -n vmware-system-csi
  • If the CSI pods come back in healthy state, the problem is resolved. If they come back in the same CLBO state, proceed to step 3.

3. If the passwords match, but users are unable to log into the vCenter Server with the service account, use the following command from vCenter SSH session to check if user account is locked:
 
  • # /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account "workload_storage_management-2927599b-1e8a-453c-a5d2-3871cbda9671" --level 2

4. If the returned output lists "Account locked: TRUE", unlock the account with the following command. It will prompt for a password, please use the vSphere administrator account. If using a different sso domain than vsphere.local, please modify the below command for the vCenter's specific SSO domain.
 
(NOTE: This command is a large string, and should be executed in its entirety from /opt to EOF):
 
  • # /opt/likewise/bin/ldapmodify -x -D cn=Administrator,cn=Users,dc=vsphere,dc=local -W <<EOF
    dn: CN=workload_storage_management-2927599b-1e8a-453c-a5d2-3871cbda9671,CN=ServicePrincipals,dc=vsphere,dc=local
    changetype: modify
    replace: userAccountControl
    userAccountControl: 0
    EOF
 

5. After unlocking the account, try logging into vCenter with the wcp_storage_management user and password noted in the vsphere-config-secret
 

If passwords are not in sync, the password will need to be reset:
 

1. If the /etc.vmware/wcp/.storageUser password differs from the secret output, and has been unlocked, reset the password by changing the number at the bottom of the /etc/vmware/wcp/.storageUser file to 0 and restarting WCP service:
 
  • # service-control --restart wcp

2. Back-up the secret on the SV cluster to ensure we can revert if required:
 
  • # kubectl get secrets vsphere-config-secret -n vmware-system-csi -o jsonpath='{.data.vsphere-cloud-provider\.conf}' |base64 -d > /root/vsphere-config-secret_orig.bak

3. Once the new password is generated, cat the /etc/vmware/wcp/.storageUser to gather the new password. Take the new password and generate a new vsphere-cloud-provider base64 encoded secret.
 
  • To modify the password, enter the entire command below into command prompt on the Supervisor Cluster. Change the fields in red to match the environment specific variables:
    • If using a different sso domain than vsphere.local, please modify the below command for the vCenter's specific SSO domain.
     - For 7.x
# cat <<EOF | base64 -w 0
[Global]
insecure-flag = "true"
cluster-id = "domain-c8"
cnsregistervolumes-cleanup-intervalinmin = 720
cluster-distribution = "SupervisorCluster"
[VirtualCenter "VCENTER_FQDN"]
user = "workload_storage_management-47462f97-5df2-498d-8256-25043cce1cd8@vsphere.local"
password = "NEW_PASSWORD_HERE"
datacenters = "datacenter-3"
port = "443"
targetvSANFileShareClusters = ""
EOF
 
 
     - For 8.x Supervisor-id needs to be added

# cat <<EOF | base64 -w 0
[Global]
insecure-flag = "false"
ca-file = "/etc/vmware/wcp/tls/vmca.pem"
cluster-id = "domain-<id>"
supervisor-id = "supervisor-<>"
cnsregistervolumes-cleanup-intervalinmin = 720
cluster-distribution = "SupervisorCluster"
[VirtualCenter "<vcfqdn>"]
user = "workload storage management-<id>@<domain>"
password = "<password>"
datacenters = "datacenter-<id>"
port = "443"
targetvSANFileShareClusters = ""
EOF
 
  • This will modify the secret into base64 and will output the hash so we can enter it into the data.vsphere-cloud-provider.conf.
 
4. Run the following to edit the secret:
 
  • # kubectl edit secrets vsphere-config-secret -n vmware-system-csi

5. Delete the hash after vsphere-cloud-provider.conf and paste the new one you created from step 3. Ensure you have a hash that is only a single line.

6. Use :wq to write and quit the file, which will save the new secret.

7. Delete the csi pod to recreate it and instantiate the new secret:
 
  • # kubectl delete pod <vmware-csi-controller-id> -n vmware-system-csi

 
PLEASE NOTE: If authentication failures persist with LDAP49 errors after the password reset, please confirm that the user has been UNLOCKED. If the user account is locked, password change will fail.



Additional Information



Impact/Risks:
This password sync will prevent Guest Clusters from creating, attaching, or syncing PV's and PVC's to new nodes and pods. Additionally, the Supervisor Cluster may fail to create new TKC's.