There are 4 LDAP servers in the LDAP cluster. When one of the nodes is down, gateway traffic results in more than 80% errors, not the expected 25%.
The LDAP identity provider is configured with 2 LDAP URLs pointing to the load balancer of the cluster with different ports.
API Gateway 10.x, 11.x
The LDAP URL is configured to use the hostname of the load balancer of the LDAP cluster. When the LDAP connection fails, the gateway will put the LDAP URL on the black list for a minute, by default, which causes the whole cluster to not be available during that period (not just one node).
Log snippet:
{"package":"com.l7tech.server.identity.ldap.LdapUrlProviderImpl","level":"INFO","log":{"service":"Login" message":"Blacklisting url for next 60 seconds.....}
For multiple LDAP URLs, the gateway will try to use the first available LDAP URL and keep using it until it fails. Then it will blacklist the URL and try next LDAP URL.
The best solution is to properly configure the load balancer and only route the requests to the available nodes in the cluster.
On the Gateway side, there are two options:
1. Change the following Cluster-wide property (this will effect all Identity Providers): ldap.reconnect.timeout = 60000 (default) - set to 0
2. Override the default just for the failing identity provider in the properties of the LDAP provider - "Reconnect Timeout" on the LDAP Wizard