Lookupsvc fails to start due to VMDIR Database inconsistency in linked vCenter server
search cancel

Lookupsvc fails to start due to VMDIR Database inconsistency in linked vCenter server

book

Article ID: 437296

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • The lookupsvc service fails to start on an Enhanced Linked Mode (ELM) vCenter Server. Attempting to manually restart of all services using "service control --start --all" gets stuck at lookupsvc as well



  • On the /var/log/vmware/sso/lookupsvc.log, entries similar to below are observed:

YYYY-MM-DDThh:mm:ss pool-2-thread-2 com.vmware.sso.interop.ldap.LdapErrorChecker] Error received by LDAP client: com.vmware.sso.interop.ldap.OpenLdapClientLibrary, error code: -1
YYYY-MM-DDThh:mm:ss pool-2-thread-2 ERROR com.vmware.vim.lookup.impl.LdapStorage] LDAP action failed; host=<Affected_Vc_Fqdn>, port=389 com.vmware.sso.interop.ldap.ServerDownLdapException: Can't contact LDAP server

  • On the /var/log/vmware/vmafd/vmafdvmdirclient.log , entries similar to below might be observed

YYYY-MM-DDThh:mm:ss:t@####: ERROR: VmDirAnonymousLDAPBindEx to (ldap://<Affected_Vc_Fqdn>:389) failed. (-1) (Can't contact LDAP server)
YYYY-MM-DDThh:mm:ss:t@####: ERROR: VmDirGetDSERootAttributeEx failed with error (9127)

  • On the /var/log/vmware/vmdir/vmdird.log , entries similar to below might be observed

    YYYY-MM-DDThh:mm:ss:t@####: WARNING: Connection accept thread: Have NOT yet started listening on LDAP port (636), waiting for the 1st replication cycle to be over.
    YYYY-MM-DDThh:mm:ss:t@####: INFO: VmDir State (3)
    YYYY-MM-DDThh:mm:ss:t@####: INFO: VmDir read-only reason (0)
    YYYY-MM-DDThh:mm:ss:t@####:WARNING: Connection accept thread: Have NOT yet started listening on LDAP port (389), waiting for the 1st replication cycle to be over.
    YYYY-MM-DDThh:mm:ss:t@####:INFO: Lotus Vmdird: running ... state (3)
  • Comparison of the replication db file /storage/db/vmware-vmdir/data.mdb in the vCenter server reveals the vmdir database file size on the affected vCenter is significantly smaller compared to its ELM replication partners.

Cause

The vmdir database file on the concerned node became affected potentially due to inconsistent snapshot revert where only one node in an ELM domain was restored, causing dependent services (such as lookupsvc) to fail. 

Resolution

  1. Take offline snapshots for all linked vCenter servers.

  2. Remove the corrupted vmdir database files on the affected vCenter Server by moving them to a different location:
    mv /storage/db/vmware-vmdir/data.mdb /var/core/data.mdb.old

  3. Execute the fixpsc utility to recreate the vmdir database files by using rebuild option while pointing to one of the other healthy linked vcenters:
    ./fixpsc rebuild --healthy-psc-fqdn <healthy_linked_vc_fqdn>

  4. Execute the lsdoctor utility using the -r and -u parameters to recreate the service registrations and solution users:
    python lsdoctor.py -r
    python lsdoctor.py -u

  5. Restart the services:
    service-control --stop --all && service-control --start --all

Additional Information

VMware vCenter in Enhanced Linked Mode pre-changes snapshot (online or offline) best practice