auth-mgr pod sporadically loses connection to CA directory and doesn't recover for hours to a day or more. This seems it could be happening after a period of very low activity, for instance no testing at all would be occurring on a weekend and last incident was Monday. Where there any changes around LDAP connection pooling for either of the first two June releases? Within your code for LDAP connection pooling are you clearing cache as needed to prevent use of a stale connection?
Connection Pooling: true","api":"/auth/v1/authenticate","appId":"6594d19b-c813-452e-aa75-b8c63b2acca3","appName":"AdminConsole","clientId":"b2a34625-58b1-4d2a-8dc4-9a38237d83ae","clientIp":"10.10.10.10","clientTid":"337a2550-0609-422a-853e-a8c494d9e98e","clientTxnId":"12877383-704f-4aaa-9be4-f69d3d131edf","dt.span_id":"b10ffbdec9b0d4b2","dt.trace_id":"4ec1c930232d5979f7739b949e8d888c","dt.trace_sampled":"true","httpMethod":"POST","relVersion":"1.0","service":"authmgr","sid":" ","sub":null,"subType":"USER","tid":"5898e5d4-c9d9-4452-b2fd-32f3f3cdfb38","tname":"default","txnId":"6ec0a750-5940-46e8-a884-efc71c2e92a5","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49","userGuid":" ","userIp":null,"userLoginId":"","userRiskLevel":" ","userRiskScore":"0","userUniversalId":" "}
{"timestamp":"2022-08-16T14:58:52Z","type":"log","level":"error","thread":"https-jsse-nio-8086-exec-1","msg":"Error getting LDAP connection (class javax.naming.CommunicationException). LDAP Config: 'smldap-dev (Type: ldap, ID: 908a6f18-2845-465b-8e64-85193610725f, URL: ldaps://smldap-dev.com:10636)', BindDN: 'cn=hub,ou=sso,ou=administrators,dc=forward,dc=com'. Details: 'smldap-dev.com:10636'","api":"/auth/v1/authenticate","appId":"6594d19b-c813-452e-aa75-b8c63b2acca3","appName":"AdminConsole","clientId":"b2a34625-58b1-4d2a-8dc4-9a38237d83ae"
{"timestamp":"2022-08-16T14:58:52Z","type":"log","level":"warn","thread":"https-jsse-nio-8086-exec-1","msg":"Error disambiguating user with login id '[email protected]'. Error Code 'SERVICE_UNAVAILABLE', Error Message 'Service unavailable (Unable to connect (smldap-dev.com:10636))'","api":"/auth/v1/authenticate","appId":"6594d19b-c813-452e-aa75-b8c63b2acca3","appName":"AdminConsole","clientId":"b2a34625-58b1-4d2a-8dc4-9a38237d83ae","clientIp":"10.10.10.10","clientTid":"337a2550-0609-422a-853e-a8c494d9e98e","clientTxnId":"12877383-704f-4aaa-9be4-f69d3d131edf","dt.span_id":"b10ffbdec9b0d4b2","dt.trace_id":"4ec1c930232d5979f7739b949e8d888c","dt.trace_sampled":"true","httpMethod":"POST","relVersion":"1.0","service":"authmgr","sid":" ","sub":null,"subType":"USER","tid":"5898e5d4-c9d9-4452-b2fd-32f3f3cdfb38","tname":"default","txnId":"6ec0a750-5940-46e8-a884-efc71c2e92a5","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49","userGuid":" ","userIp":null,"userLoginId":"","userRiskLevel":" ","userRiskScore":"0","userUniversalId":"
Release : 1.x
Component : VIP AuthHub
smldap-dev LDAP server had two nodes behind it and both the Nodes had different certs, one was pointing to QA env and other one to Dev environment. AuthHub does the cert Hostname checking and when the request was going to the QA server which has QA cert, the requests were failing. Resolution is to have proper certificates installed on both nodes or take out the QA node out from the LDAP pool.
AuthHub PM team is made aware if they can add flexibility of not doing the Cert Hostname validation.