Background of issue:
There are 4 ldap servers in the ldap cluster, but when one of the node is down, the gateway has got more than 80% errors, not the expected 25%.
The ldap identity provider is configured 2 ldap urls pointing to the load balancer of the cluster with different ports.
Gateway 10.x 11.x
The LDAP url is configured to use the hostname of the load balancer of the LDAP cluster.
When the LDAP connection is fail, the gateway will put the LDAP url in the black list for a minute by default, then the whole cluster is not available during that period(not just one node).
Log snippet:
{"package":"com.l7tech.server.identity.ldap.LdapUrlProviderImpl","level":"INFO","log":{"service":"Login" message":"Blacklisting url for next 60 seconds.....}
For multiple LDAP urls, the gateway will try to use the first available LDAP url, and keep using it until it fails, then put it in black list and try next LDAP url.
The best solution is, the load balancer should be properly configured and only route the requests to the available nodes in the cluster.
On the Gateway side two options: