VCHA failover to passive node fails to bring up SSO services due to vmdird LDAP error 49
search cancel

VCHA failover to passive node fails to bring up SSO services due to vmdird LDAP error 49

book

Article ID: 313942

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • You are running a VCHA manual failover to the passive node
  • The VCHA failover completes successfully however the vCenter Webclient  page shows the error "503 Service Unavailable"
  • vmware-sts-idmd logs show idmd service not starting
[2019-10-30T22:59:41.691Z ERROR] [IdmServer] IDM Server has failed to start
com.vmware.identity.interop.ldap.InvalidCredentialsLdapException: Invalid credentials
 
  • vmdird logs show password errors for the vCenter machine account
19-10-30T22:57:51.581345+00:00 err vmdird t@140674263410432: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
19-10-30T22:57:51.602017+00:00 err vmdird t@140674263410432: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "cn=vc01.test.local,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
19-10-30T22:58:01.591518+00:00 err vmdird t@140674263410432: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

 


Environment

VMware vCenter Server Appliance 6.5.x
VMware Update Manager 6.5

Cause

vmdird logs showed that the copy interval for data.mdb file is 0 

19-02-01T02:29:15.527397+00:00 info vmdird t@140331222071040: VmDirInitDbCopyThread: database snapshot reg keys: CopyDbWritesMin 1 CopyDbIntervalInSec 0 CopyDbBlockWriteInSec 30

vmdir maintains copy of machine account passwords in registry and mdb file . mdb file is always updated with any changes happening (like password changes) at an interval specified by CopyDbIntervalInSec. Since it is currently set to 0, no mdb file will be created and a password mismatch ( between mdb & registry) happens. The current sync interval for the machine account password is 45 days and if a vcha failover is triggered within 45 days then this issue will be triggered.

Resolution

VMware Engineering is aware of this issue and is working on a permanent fix in a future release.
Currently there is no permanent solution, however a workaround is available.




Workaround:
To work-around the issue, follow the below steps in order:

Login to Active vCenter and perform the below steps

1. Set the CopyDbIntervalInSec registry value using this command
   /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"
2. Reset password using methods described in KB https://kb.vmware.com/s/article/2147280
3. Restart the vmdird service using KB https://kb.vmware.com/s/article/2109887
4. Wait for few minutes for replication of snapshot and registry values to complete - ~5 mins
5. Trigger a VCHA manual Failover
6. Once the Passive node becomes Active set the CopyDbIntervalInSec Registry in the Passive Node(Now Active)
   /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"
8. Restart the vmdird service using KB https://kb.vmware.com/s/article/2109887
9. Trigger a VCHA manual failover and confirm the services are accessible