Understanding LDAP failover
search cancel

Understanding LDAP failover

book

Article ID: 51638

calendar_today

Updated On:

Products

SITEMINDER

Issue/Introduction

What does mean the following in the smstracedefault.log:

[03/30/2010][09:36:32][2388][3316][][][][Current ip: <LDAPServer1>:<port>, ts:1269934592; best ip: <LDAPServer2>:<port>, ts: 1269934568]
[03/30/2010][09:36:32][2388][3316][][][][Failing over to LDAP server '<LDAPServer2>:<port>' in LDAP server bank #1.].

 

Environment

Policy Server: 12.8.x

Resolution

In order to optimize the response time, the Policy Server runs an algorithm in the background which help in determining the best server (best IP) among the User Directories. The LDAP server which is "best IP" will become next on the round-robin list.

Failover can be caused by different reasons:

After a user directory has been configured, the LDAP provider initializes related internal structures upon receiving the first request. The initialization consists of the following steps:

  1. For each fail-over group a ping thread is created and started.
  2. The first available server is selected from a fail-over group. If there are no available servers in the first fail-over group, the second is tried and so on.
  3. Search and user connections are created to the selected server.
  4. The load-balancing counter is incremented.

For subsequent requests, all steps except for 1 are executed.

According to the user directory configuration, the LDAP servers are placed in failover and load-balancing order accordingly. Each configured server is part of a fail-over group. Upon receiving a request, the first available server from the current fail-over group is selected. The policy server then established user and search connections to that server if not already established.

These connections will be maintained until either of the following happens:

  • An LDAP request returns with a network error. The connections are then re-initialized.
  • The ping thread detects that an LDAP server in the same fail-over group located before the current server is now available.
  • The ping thread detects that the server is unavailable. The connections are then re-initialized.

Some more details:

At time of Policy Server startup, 3 connections are made to each User Store:

  • 2 connections are used for LDAP Searches and Binds, these are the ones SiteMinder is using to search for users.
  • The third connection is used as a Ping Thread to test each 30 seconds (as defined in Ping timeout) if the LDAP servers and connections are available.
  • During initialization, a separate ping thread is created for each fail-over group. For each server in the group, the thread creates a ping connection and puts it in the ping connection list. Periodically (the default period is 30 seconds) the thread validates the connection status of all connections in the list. After the last connection in the list has been validated, the thread sleeps for 30 seconds and starts the new round.