This article is intended to provide steps to review wcp-cluster-user-domain user account password sync, provide a workaround to unlock the account, and reset the password manually if unlocking doesn't resolve and passwords are out of sync.
wcp-cluster-user-domain" user account is used by vSphere with Tanzu to allow the wcp-schedext pod to authenticate with vCenter in order to translate scheduler operations into DRS. If the wcp-cluster-user-domain account is locked, or if the password is out of sync, vSphere Supervisor Clusters will not be able to synchronize the wcp-schedext pod against DRS for scheduling decisions. This will lead to Supervisor Cluster pods hanging in Pending state.YYYY-MM-DDTHH:MM:SS stderr F YYYY-MM-DDTHH:MM:SS error schedext [opID=cfgMapUpdate-40a0] Could not login to vCenter. Error: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
/var/log/vmware/vmdird/vmdird-syslog.log might show errors like;YYYY-MM-DDTHH:MM:SS err vmdird t@139774299973376: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)
YYYY-MM-DDTHH:MM:SS warning vmdird t@139774299973376: Lockout policy check - account lockout. (cn=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=serviceprincipals,dc=domain,dc=local)
VMware vSphere 7.0 with Tanzu
The password sync and lockout failure is a very rare condition and root cause is still under investigation. We have identified this condition most commonly after Supervisor Cluster certificates expire.
If the wcp-cluster-user-domain account is locked, or if the password is out of sync, vSphere Supervisor Clusters will not be able to synchronize the wcp-schedext pod against DRS for scheduling decisions. This will lead to Supervisor Cluster pods hanging in Pending state.
Workaround:
Check WCP-Cluster-User-Domain account lock status:
From vCenter SSH: Check wcp logging, gather wcp-cluster-user-domain account ID. Logs are located here on vCenter: /var/log/vmware/wcp/wcpsvc.log
wcp-cluster-user-domain-c#-#####-####-####-####-#########@domain.localc# is the ClusterID on which WCP was built. The #####-####-####-####-######### is the vCenter MachineID.From Supervisor VM: Check wcp-schedext pod logs on Supervisor cluster to see if they're reporting login failures: k
kubectl logs -n kube-system kube-scheduler-<POD_ID> -c wcp-schedext | less
From vCenter SSH: Check /var/log/vmware/vmdird/vmdird-syslog.log to see if account is locked. You will see messages like the following if it is:
YYYY-MM-DDTHH:MM:SS warning vmdird t@140502791870208: LoginBlocked DN (cn=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=serviceprincipals,dc=domain,dc=local), error (9241)(Account access blocked)YYYY-MM-DDTHH:MM:SS err vmdird t@139774291580672: VmDirSendLdapResult: Request (Bind), Error (LDAP_INVALID_CREDENTIALS(49)), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)YYYY-MM-DDTHH:MM:SS err vmdird t@139774291580672: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "CN=wcp-cluster-user-domain-<ClusterID>-<VC_MachineID>,cn=ServicePrincipals,dc=domain,dc=local", Method: SASL
/usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account wcp-cluster-user-domain-c#-#####-####-####-####-######### --level 2
Output will look like:
Account: wcp-cluster-user-domain-c#-#####-####-####-####-#########UPN: wcp-cluster-user-domain-c#-#####-####-####-####-#########@domain.localAccount disabled: FALSEAccount locked: TRUEPassword never expires: FALSEPassword expired: FALSEPassword expiry: 9998 day(s) 19 hour(s) 57 minute(s) 58 second(s)
/opt/likewise/bin/ldapmodify -x -D cn=Administrator,cn=Users,dc=vsphere,dc=local -W <<EOFdn: CN=wcp-cluster-user-domain-c#-#####-####-####-####-#########,CN=ServicePrincipals,dc=domain,dc=localchangetype: modifyreplace: userAccountControluserAccountControl: 0EOFIf issue still persists after completing above, please raise a case with Broadcom Support.