ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

Error : Sm_AgentApi_Init Failed intermittently in the event viewer log

book

Article ID: 7676

calendar_today

Updated On:

Products

CA Single Sign On Secure Proxy Server (SiteMinder) CA Single Sign On SOA Security Manager (SiteMinder) CA Single Sign-On SITEMINDER CA Single Sign On Agents (SiteMinder)

Issue/Introduction

 

Most of the times, the IIS Web Agent fails to start and gives a 500
error when the client tries to connect.

There is no error available in the Web Agent log or trace, and the
only trace found is that there is an

  Sm_AgentApi_Init Failed error

message reported in the Windows Event Viewer.

When the problem occurs, the LLAWP process attempts to start but
fails.

 

Cause

 

This issue is due to the difference in time it takes for the OS and
LLAWP shutdown processes to proceed.
 
In a scenario (for instance in a testing or on frequent and unstable
environments) where reboots of the Web Agent machine are unexpected
and quite frequent, this problem may happen due to the mechanism used
by the IIS Web Agent to manage the connections between the IIS process,
LLAWP and the Policy server:

  - When the agent machine is restarted the system stops the services,
    including the IIS Server. This will result in a shutdown call to
    the Web Agent which will close all the connections.

  - When all the w3wp processes are shutting down they will unregister
    from the LLAWP in the Web Agent code.

  - When LLAWP detects that there are zero 'clients' connected to it,
    it will wait 20 seconds, then it will shut down, closing its
    connections to the the Policy Server.  The reason why it waits for
    20 seconds is by design: if the IIS shuts down, but then it is
    restarted again, and the Web Agent initializes, it will be much
    faster since there will be no need to create again a LLAWP process
    as there is one which is already initialized - resulting in a
    faster initialization of the Web Agent. This is useful in IIS
    since the w3wp process itself exits after a short while of
    inactivity (so this scenario happens routinely).

    However, if the Operating System shuts down before these 20
    seconds, LLAWP never closes the connections to the Policy
    server. This leaves connections open in the Policy Server machine,
    which does not know that the other side is no longer there. In the
    Policy Server machine, if running a 'netstat' command, the
    connections coming from the Web Agent IP and port appear in the
    'ESTABLISHED' state. Assuming the process of restarting the Web
    Agent machine is happening very frequently, this mechanism will
    cause a lot of connections to be left as 'ESTABLISHED' in the
    Policy Server machine, even if there is no corresponding LLAWP
    process on the other side.

It may happen that during the frequent reboots, at one point the Web
Agent tries to create a connection but it is assigned a port number
that corresponds to one of these abandoned connections. So when it
tries to connect, the Policy Server machine will see this as
irregular, and it will not be sending a SYN/ACK packet in response to
the SYN, which will cause a TCP sequence mismatch, resulting in the
Web Agent machine discarding this packet as it is out of sequence and
believing it is not meant to be a response to the SYN.

The Web Agent then waits for the SYN/ACK for 2 seconds and times out,
resulting in failure to initialize.

 

Environment

 

Policy Server all versions;
Web Agent R12.52 SP1 CR06 on IIS 7.5 (Windows 2008 R2);
IIS 7.5 running a mix of Classic and Integrated pipeline mode application pools (32 and 64 bit);

 

Resolution

 

This is not really an error or defect, but the expected behaviour in
the scenario described (very frequent reboots of the IIS agent
machine). Thus it is unlikely to happen in a production environment,
which should be very stable by design.

However, if there is a concern that such a situation may happen, there
are several possible corrective actions available to mitigate it or
eliminate it completely:

 - Turn on keep alive in the Policy Server machine. To do this, you
   have to enable the following environment variables in both the
   Policy Server machine and the Web Agent machine:

   SM_ENABLE_TCP_KEEPALIVE
   Enable the variable with the following value: 1 

   - For Unix, SM_ENABLE_TCP_KEEPALIVE=1, and export the variable.
   - For Windows, create the following system environment variable
     with a value of 1:
     SM_ENABLE_TCP_KEEPALIVE

  - Review possible Windows settings which wait for a process (LLAWP)
    to exit before performing the shutdown. This will be specific to
    Windows versions and to each environment idiosyncrasy, so it
    should be researched in each case.

    Port range can be increased on the Web Agent machine. The
    following article by Microsoft may be of use (1).

 

Additional Information

 

(1)

    The default dynamic port range for TCP/IP has changed since Windows Vista and in Windows Server 2008