A SiteMinder Access Gateway (SPS) running on Windows crashes frequently showing a problem with the SspiCli.dll. An mdmp file is created too.
Users report a 503 error message in the browser while attempting to access application protected with Access Gateway (SPS).
Crash and traces files show this :
hs_err_pid1664.log
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fc135e9b5b,
# pid=1664, tid=0x0000000000001f6c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build
# 1.8.0_172-b11) Java VM: Java HotSpot(TM) 64-Bit Server VM
# (25.172-b11 mixed mode windows-amd64 compressed oops) Problematic
# frame: C [SspiCli.dll+0x9b5b]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 5848
com.netegrity.proxy.jagent.proxy.CSmJavaAgentFacadeProxyImpl.doJNIPr
ocessRequest(Ljava/lang/String;Lcom/netegrity/proxy/jagent/JavaSeria
lizedAgentData;)I (0 bytes) @ 0x00000000025da0a3 [0x00000000025da040
+0x63]
J 7625 C2
com.netegrity.proxy.ProxyValve.processRequest(Lorg/apache/catalina/c
onnector/Request;Lorg/apache/catalina/connector/Response;Lcom/netegr
ity/proxy/VirtualHost;Ljava/lang/String;Z)V
(1967 bytes) @ 0x0000000002cb3b3c [0x0000000002cb2c00+0xf3c]
time: Tue Mar 26 08:46:39 2019
Debug Diag
In hs_err_pid .mdmp the assembly instruction at
sspicli!AcceptSecurityContext+e6 in C:\Windows\System32\sspicli.dll
from Microsoft Corporation has caused an access violation exception
(0xC0000005) when trying to read from memory location 0x00002744 on
thread 31
Visual Studio
Unhandled exception at 0x00007FFEECA1F586 (sspicli.dll) in
hs_err_pid1224.mdmp: 0xC0000005: Access violation reading location
0x0000272700002744. occurred
The process crash at NTLM authentication :
SPStrace.log :
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][IsResourceProtected][Resource is protected from Policy Server.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][Calling SM_WAF_HTTP_PLUGIN->ProcessResponses.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][CSmHttpPlugin::ProcessResponses][Processing IsProtected responses.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][SM_WAF_HTTP_PLUGIN->ProcessResponses returned SmSuccess.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][Calling SM_WAF_AG_PLUGIN->ProcessResponses.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][SM_WAF_AG_PLUGIN->ProcessResponses returned SmNoAction.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][CSmCredentialManager::GatherAdvancedAuthCredentials][Calling SM_WAF_HTTP_PLUGIN->ProcessAdvancedAuthCredentials.]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][SmNtc::getCredentials][user-agent received Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][SmNtc::getCredentials][Request for SSPI NTLM Authentication]
[03/26/2019][08:46:37][1664][8044][<Transaction ID>][DeleteCookie][Deleted cookie 'SM_NTLMCTX'.]
How can we solve this?
SPS / Access Gateway 12.7 all service packs, 12.8 all service packs.
Windows OS.
Using IWA Authentication or IWA Failover to Forms.
The problem of the crash in SspiCli.dll is due to a problem in the Microsoft code. To bypass this, you have to configure the sticky bit on the load balancer and add the ACO parameter usentlmmapforntlmauth and set it to "yes".
This would prevent the load balancer to forward the NTLM type 1 authentication request from a browser (or another client) to one SPS box, and then forward the continuation of the authentication process, the NTLM type 3 request to a different SPS box.
About the usentlmmapforntlmauth=yes ACO parameter:
When this is set to "yes", then the SPS will use an internal map to track NTLM request types. If an NTLM type 3 request is sent to the SPS, but this SPS did not receive a prior NTLM type 1 request from the same client in this authentication flow, it will treat the NTLM request as type 1. Thus, CA SSO will not send out-of-sequence messages to the AcceptSecurityContext() function, avoiding the crash.
Here's a sample of how to troubleshoot and see this behavior :
The code stack SspiCli.dll+0xf586 or sspicli!AcceptSecurityContext+e6,
via a code review shows that the NTLM Authentication was received by
the crashing process out of order.
For example, the AUTHENTICATE_MESSAGE is received by the Access
Gateway server for a request prior to the NEGOTIATE_MESSAGE
The NTLM Authentication Protocol consists of three message types used
during authentication and one message type used for message integrity
after authentication has occurred. The authentication messages:
NEGOTIATE_MESSAGE (2.2.1.1)
CHALLENGE_MESSAGE (2.2.1.2)
AUTHENTICATE_MESSAGE (2.2.1.3)
This "Out of order" flow is a symptom of a network load balancer or a similar device in front of the Access Gateway Server not configured as needed for Sticky Sessions.
To troubleshoot this issue, we saw that during the flow of the NTLM Authentication, the requests were sent to more than one SPS / Access Gateway in the Server Farm.
We made the following changes to each Apache instance within SPS on the servers to generate a unique header.
EXAMPLE:
In the httpd.conf file (\CA\secure-proxy\httpd\conf)
#Adding load headers_module for testing remove after LoadModule headers_module modules/mod_headers.so <IfModule headers_module> #RequestHeader unset DNT env=bad_DNT Header set ServerName "MY-SPS-SVR01" </IfModule>
NOTE: The Access Gateway services need to be restarted after making this change.
During the replication of this issue, we can see the header created by Apache changes during the NTLM Authentication Flow.
Example:
ServerName: MY-SPS-SVR01
Then on the next response, we would see
ServerName: MY-SPS-SVR02
This showed that the load balancer in front of the Access Gateway servers generates a sticky session for the requests.
Set the load balancer Sticky Bit and add the ACO parameter usentlmmapforntlmauth=yes in the Access Gateway (SPS) agent configuration object (ACO) to solve the issue.
On the F5 load balancer, set Sticky Sessions / Session Persistence / Sticky-bit.
On ProxySG, set "cookie persistence".