TKGS wcp-cluster-user-domain User account password unlock and reset procedure
search cancel

TKGS wcp-cluster-user-domain User account password unlock and reset procedure

book

Article ID: 305321

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

This article is intended to provide steps to review wcp-cluster-user-domain user account password sync, provide a workaround to unlock the account, and reset the password manually if unlocking doesn't resolve and passwords are out of sync.

  • The "wcp-cluster-user-domain" user account is used by vSphere with Tanzu to allow the wcp-schedext pod to authenticate with vCenter in order to translate scheduler operations into DRS. If the wcp-cluster-user-domain account is locked, or if the password is out of sync, vSphere Supervisor Clusters will not be able to synchronize the wcp-schedext pod against DRS for scheduling decisions. This will lead to Supervisor Cluster pods hanging in Pending state.

  • You might see errors like the following from wcp-schedext pod logs on the Supervisor Cluster: 
YYYY-MM-DDTHH:MM:SS stderr F YYYY-MM-DDTHH:MM:SS error schedext [opID=cfgMapUpdate-40a0] Could not login to vCenter. Error: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
  • In vCenter server, the /var/log/vmware/vmdird/vmdird-syslog.log might show errors like;
YYYY-MM-DDTHH:MM:SS err vmdird  t@139774299973376: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)
YYYY-MM-DDTHH:MM:SS warning vmdird  t@139774299973376: Lockout policy check - account lockout. (cn=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=serviceprincipals,dc=domain,dc=local)

Environment

VMware vSphere 7.0 with Tanzu

Cause

The password sync and lockout failure is a very rare condition and root cause is still under investigation. We have identified this condition most commonly after Supervisor Cluster certificates expire.

If the wcp-cluster-user-domain account is locked, or if the password is out of sync, vSphere Supervisor Clusters will not be able to synchronize the wcp-schedext pod against DRS for scheduling decisions. This will lead to Supervisor Cluster pods hanging in Pending state.

Resolution

Currently, the resolution is to wait for the wcp-cluster-user-domain account password sync to trigger on its automated timestamp, which is every 12 hours. If this is blocking time critical operations, we can apply the below workaround to speed up the process.

Workaround:

Check WCP-Cluster-User-Domain account lock status:

  • From vCenter SSH: Check wcp logging, gather wcp-cluster-user-domain account ID. Logs are located here on vCenter: /var/log/vmware/wcp/wcpsvc.log

    • Example of wcp-cluster-user-domain user ID: wcp-cluster-user-domain-c#-#####-####-####-####-#########@domain.local
    • The c# is the ClusterID on which WCP was built. The #####-####-####-####-######### is the vCenter MachineID.
  • From Supervisor VM: Check wcp-schedext pod logs on Supervisor cluster to see if they're reporting login failures: k

kubectl logs -n kube-system kube-scheduler-<POD_ID> -c wcp-schedext | less

  • From vCenter SSH: Check /var/log/vmware/vmdird/vmdird-syslog.log to see if account is locked. You will see messages like the following if it is:

YYYY-MM-DDTHH:MM:SS warning vmdird  t@140502791870208: LoginBlocked DN (cn=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=serviceprincipals,dc=domain,dc=local), error (9241)(Account access blocked)
YYYY-MM-DDTHH:MM:SS err vmdird  t@139774291580672: VmDirSendLdapResult: Request (Bind), Error (LDAP_INVALID_CREDENTIALS(49)), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
YYYY-MM-DDTHH:MM:SS err vmdird  t@139774291580672: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "CN=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=ServicePrincipals,dc=domain,dc=local", Method: SASL 

  • If account is reporting locked, check user status using dir-cli on vCenter SSH:

/usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account wcp-cluster-user-domain-c#-#####-####-####-####-######### --level 2
 
Output will look like:

Account: wcp-cluster-user-domain-c#-#####-####-####-####-#########
UPN: wcp-cluster-user-domain-c#-#####-####-####-####-#########@domain.local
Account disabled: FALSE
Account locked: TRUE
Password never expires: FALSE
Password expired: FALSE
Password expiry: 9998 day(s) 19 hour(s) 57 minute(s) 58 second(s)

  • If account is showing locked, use the following command to unlock the account (please note, this command executes everything between <<EOF and the final line EOF):
     
    /opt/likewise/bin/ldapmodify -x -D cn=Administrator,cn=Users,dc=vsphere,dc=local -W <<EOF
    dn: CN=wcp-cluster-user-domain-c#-#####-####-####-####-#########,CN=ServicePrincipals,dc=domain,dc=local
    changetype: modify
    replace: userAccountControl
    userAccountControl: 0
    EOF

If issue still persists after completing above, please raise a case with Broadcom Support.