Symptoms:
[2019-10-30T22:59:41.691Z ERROR] [IdmServer] IDM Server has failed to start
com.vmware.identity.interop.ldap.InvalidCredentialsLdapException: Invalid credentials
19-10-30T22:57:51.581345+00:00 err vmdird t@140674263410432: VmDirSendLdapResult: Request (Bind), Error (49), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
19-10-30T22:57:51.602017+00:00 err vmdird t@140674263410432: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "cn=vc01.test.local,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
19-10-30T22:58:01.591518+00:00 err vmdird t@140674263410432: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)
vmdird logs showed that the copy interval for data.mdb file is 0
19-02-01T02:29:15.527397+00:00 info vmdird t@140331222071040: VmDirInitDbCopyThread: database snapshot reg keys: CopyDbWritesMin 1 CopyDbIntervalInSec 0 CopyDbBlockWriteInSec 30
vmdir maintains copy of machine account passwords in registry and mdb file . mdb file is always updated with any changes happening (like password changes) at an interval specified by CopyDbIntervalInSec. Since it is currently set to 0, no mdb file will be created and a password mismatch ( between mdb & registry) happens. The current sync interval for the machine account password is 45 days and if a vcha failover is triggered within 45 days then this issue will be triggered.
VMware Engineering is aware of this issue and is working on a permanent fix in a future release.
Currently there is no permanent solution, however a workaround is available.
Workaround:
To work-around the issue, follow the below steps in order:
Login to Active vCenter and perform the below steps
1. Set the CopyDbIntervalInSec registry value using this command
/opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"
2. Reset password using methods described in KB https://knowledge.broadcom.com/external/article?legacyId=2147280
3. Restart the vmdird service using KB https://knowledge.broadcom.com/external/article?legacyId=2109887
4. Wait for few minutes for replication of snapshot and registry values to complete - ~5 mins
5. Trigger a VCHA manual Failover
6. Once the Passive node becomes Active set the CopyDbIntervalInSec Registry in the Passive Node(Now Active)
/opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\vmdir\Parameters]' "CopyDbIntervalInSec" "60"
8. Restart the vmdird service using KB https://knowledge.broadcom.com/external/article?legacyId=2109887
9. Trigger a VCHA manual failover and confirm the services are accessible.