SIteMinder Access Gateway Crashing intermittently
search cancel

SIteMinder Access Gateway Crashing intermittently

book

Article ID: 131594

calendar_today

Updated On:

Products

CA Single Sign On Secure Proxy Server (SiteMinder) CA Single Sign On SOA Security Manager (SiteMinder) CA Single Sign-On SITEMINDER

Issue/Introduction

A SiteMinder Access Gateway (SPS) running on Windows crashes frequently showing a problem with the SspiCli.dll. An mdmp file is created too.

Users report a 503 error message in the browser while attempting to access application protected with Access Gateway (SPS).

Crash and traces files show this :

hs_err_pid1664.log 

  # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007fc135e9b5b, 
  # pid=1664, tid=0x0000000000001f6c 
  # 
  # JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build 
  # 1.8.0_172-b11) Java VM: Java HotSpot(TM) 64-Bit Server VM 
  # (25.172-b11 mixed mode windows-amd64 compressed oops) Problematic 
  # frame: C [SspiCli.dll+0x9b5b] 

  Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) 

  J 5848 
  com.netegrity.proxy.jagent.proxy.CSmJavaAgentFacadeProxyImpl.doJNIPr 
  ocessRequest(Ljava/lang/String;Lcom/netegrity/proxy/jagent/JavaSeria 
  lizedAgentData;)I (0 bytes) @ 0x00000000025da0a3 [0x00000000025da040 
  +0x63] 

  J 7625 C2 
  com.netegrity.proxy.ProxyValve.processRequest(Lorg/apache/catalina/c 
  onnector/Request;Lorg/apache/catalina/connector/Response;Lcom/netegr 
  ity/proxy/VirtualHost;Ljava/lang/String;Z)V 
  (1967 bytes) @ 0x0000000002cb3b3c [0x0000000002cb2c00+0xf3c] 

  time: Tue Mar 26 08:46:39 2019 

Debug Diag

  In hs_err_pid .mdmp the assembly instruction at
  sspicli!AcceptSecurityContext+e6 in C:\Windows\System32\sspicli.dll
  from Microsoft Corporation has caused an access violation exception
  (0xC0000005) when trying to read from memory location 0x00002744 on
  thread 31

Visual Studio

  Unhandled exception at 0x00007FFEECA1F586 (sspicli.dll) in
  hs_err_pid1224.mdmp: 0xC0000005: Access violation reading location
  0x0000272700002744. occurred

The process crash at NTLM authentication : 

SPStrace.log : 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][IsResourceProtected][Resource is protected from Policy Server.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][Calling SM_WAF_HTTP_PLUGIN->ProcessResponses.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][CSmHttpPlugin::ProcessResponses][Processing IsProtected responses.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][SM_WAF_HTTP_PLUGIN->ProcessResponses returned SmSuccess.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][Calling SM_WAF_AG_PLUGIN->ProcessResponses.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][ProcessResponses][SM_WAF_AG_PLUGIN->ProcessResponses returned SmNoAction.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][CSmCredentialManager::GatherAdvancedAuthCredentials][Calling SM_WAF_HTTP_PLUGIN->ProcessAdvancedAuthCredentials.] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][SmNtc::getCredentials][user-agent received Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][SmNtc::getCredentials][Request for SSPI NTLM Authentication] 

  [03/26/2019][08:46:37][1664][8044][<Transaction ID>][DeleteCookie][Deleted cookie 'SM_NTLMCTX'.] 

How can we solve this?
 

Environment

SPS / Access Gateway 12.7 all service packs, 12.8 all service packs.
Windows OS.
Using IWA Authentication or IWA Failover to Forms. 
 

Cause

The problem of the crash in SspiCli.dll is due to a problem in the Microsoft code. To bypass this, you have to configure the sticky bit on the load balancer and add the ACO parameter usentlmmapforntlmauth and set it to "yes".

This would prevent the load balancer to forward the NTLM type 1 authentication request from a browser (or another client) to one SPS box, and then forward the continuation of the authentication process, the NTLM type 3 request to a different SPS box. 

About the usentlmmapforntlmauth=yes ACO parameter: 

When this is set to "yes", then the SPS will use an internal map to track NTLM request types. If an NTLM type 3 request is sent to the SPS, but this SPS did not receive a prior NTLM type 1 request from the same client in this authentication flow, it will treat the NTLM request as type 1. Thus, CA SSO will not send out-of-sequence messages to the AcceptSecurityContext() function, avoiding the crash. 

Here's a sample of how to troubleshoot and see this behavior :

The code stack SspiCli.dll+0xf586 or sspicli!AcceptSecurityContext+e6,
via a code review shows that the NTLM Authentication was received by
the crashing process out of order.

For example, the AUTHENTICATE_MESSAGE is received by the Access
Gateway server for a request prior to the NEGOTIATE_MESSAGE

The NTLM Authentication Protocol consists of three message types used
during authentication and one message type used for message integrity
after authentication has occurred. The authentication messages:

NEGOTIATE_MESSAGE (2.2.1.1)
CHALLENGE_MESSAGE (2.2.1.2)
AUTHENTICATE_MESSAGE (2.2.1.3)

This "Out of order" flow is a symptom of a network load balancer or a similar device in front of the Access Gateway Server not configured as needed for Sticky Sessions.

To troubleshoot this issue, we saw that during the flow of the NTLM Authentication, the requests were sent to more than one SPS / Access Gateway in the Server Farm.

We made the following changes to each Apache instance within SPS on the servers to generate a unique header.

EXAMPLE:

In the httpd.conf file (\CA\secure-proxy\httpd\conf)
 

#Adding load headers_module for testing remove after 
LoadModule headers_module modules/mod_headers.so

<IfModule headers_module> 
#RequestHeader unset DNT env=bad_DNT 
Header set ServerName "MY-SPS-SVR01" 
</IfModule> 

NOTE: The Access Gateway services need to be restarted after making this change. 

During the replication of this issue, we can see the header created by Apache changes during the NTLM Authentication Flow.

Example:

ServerName: MY-SPS-SVR01

Then on the next response, we would see 

ServerName: MY-SPS-SVR02

This showed that the load balancer in front of the Access Gateway servers generates a sticky session for the requests.

Resolution

Set the load balancer Sticky Bit and add the ACO parameter usentlmmapforntlmauth=yes in the Access Gateway (SPS) agent configuration object (ACO) to solve the issue.

On the F5 load balancer, set Sticky Sessions / Session Persistence / Sticky-bit.
On ProxySG, set "cookie persistence".

Additional Information

https://httpd.apache.org/docs/2.4/mod/mod_headers.html