Apache by default not leading to an efficient Web Server and Web Agent deployment

Products

CA Single Sign-On SITEMINDER CA Single Sign On Agents (SiteMinder) CA Single Sign On Secure Proxy Server (SiteMinder)

Issue/Introduction

Apache by default will build on Unix with "pre-fork" mode for its worker threads, this does not use threads, and creates an inefficient interface to the Policy Server.

The better solution is to select the "worker" mode for the Apache build, which will then use a threading module.

However often people are stuck using the "pre-fork" mode, since they cannot upgrade immediately.

Some methods exist for minimizing the impact.

Resolution

With regards to working better with Apache pre-fork connections, as Apache themselves recommend:

Extending the modular design to this level of the server allows two important benefits (1):

Apache httpd can more cleanly and efficiently support a wide variety of operating systems. In particular, the Windows version of the server is now much more efficient, since mpm_winnt can use native networking features in place of the POSIX layer used in Apache httpd 1.3. This benefit also extends to other operating systems that implement specialized MPMs.

The server can be better customized for the needs of the particular site. For example, sites that need a great deal of scalability can choose to use a threaded MPM like worker or event, while sites requiring stability or compatibility with older software can use a prefork.

So, it would be a good goal to change a setup from the pre-fork, or no-threaded module, to the worker threaded processing module.

However, with SiteMinder, the pre-fork module also causes several additional issues, mainly:

Many excess socket connections from web server to the Policy Server:

Apache process will create separate connections to the Policy Server for each process, and it will tend to renew those processes regularly, often at peak times a process will handle one transaction then die.

Each process will connect as specified in the Host Config Object, so each process may start up, perform a handshake with the Policy Server, connects 2 sockets to each Policy Server in the Cluster for load balancing, process one request then die.

It isn't unusual to have 90-200 Apache processes active at one time, each with 2 connections to each Policy Server, and the Web Server cycling though 40 different Apache child processes per process.
The Policy Server will do a lot of additional processing:

Each initial connect request from a Web Agent is a high priority message, and a new Policy Server thread is allocated to handle it (up to a maximum).

So, often on these Apache/prefork environments, the Policy Server has MAX_THREADS allocated, even though most of them will actually be idle.

In addition, there is the overhead of handling the handshake for each process.
A lot of sockets on the Policy Server are in the TIME_WAIT state:

SiteMinder is geared to expect the repeated use of the same socket.

Apache, when it closes a child process, does not seem to send a final close to the Policy Server, so as a rule with Apache pre-fork clients, the Policy Server tends to have a lot of TIME_WAIT connections, and a lot of IDLE timeout messages, as the Policy Server closes them every 10 minutes.

This can be a strain on the MAX_CONNECTIONS setting for both SiteMinder and for the hardware, although a lot of sockets are open, most have timed out and are waiting for closure.
A restart of the Policy Server can cause a flood of new requests on the Policy Server

Since each process is independent, when a Policy Server restarts, they, all perhaps 100 on each Web Server will try and reconnect to the Policy Server, this can cause an avalanche of new connections, overloading the Policy Server.

What tends to then happen is the client's timeout, the socket remains open and the client tries a new connection, which usually makes the startup process worse.

If the Policy Server fails several times before it restarts successfully, then this may be what is happening.

Why is worker mode better?

Apache using "worker" mode still uses process, but each process also has about 20 worker threads. The thread pool can go up and down, but the number of processes is both smaller and from a SiteMinder viewpoint.

There is only one handshake per process, and all the threads in a pool share the connections to the Policy Server, so, they actually get re-used, and also load balanced among the Policy Servers.
Why is Apache pre-fork the default?

Unfortunately, out of the box, Apache is compiled on any Linux platform as pre-fork build. It works on all Linux platforms because it does not use any threads.

Threads are, however, now fairly consistent across most platforms, so hopefully the default will be changed in the near future.
Stuck with pre-fork for now, what can be done?

The best solution is to use Apache worker mode, not pre-fork however that usually requires a re-build of the Apache executable.

When stuck with pre-fork mode for a while, there are two major settings that will help reducing the problems that it causes:

1. Change the Host Config Object
  
  Now, the Agent Host Config Object is configured that way:
  
  hostname='<name>'
  maxsocketsperport='20'
  minsocketsperport='2'
  newsocketstep='2'
  policyserver='<policy_server1>,44441,44442,44443'
  policyserver='<policy_server2>,44441,44442,44443'
  requesttimeout='60'
  
  The minsocketperport and newsocketstep should be reduced to 1.
  
  minsocketsperport='1'
  newsocketstep='1'
  
  This will ensure that for your 100 Apache child process, each will initially only establish 1 connection to each Policy Server, rather than two, since each process is only handling one request at a time.
  
  There is no advantage of having more than one connection to each Policy Server.
  
  That will halve the connections.
  
  In addition, when setting the Host Config Object to perform failover is possible, often it is easiest to have two or more HCO objects each failing over in a separate sequence through the Policy Server farm.
  
  That will further reduce the startup time for each new Apache process, and the number of connections.
2. Change the socket timeout
  
  The default setting is 10 minutes, again since the majority of these socket connections are Apache ones waiting for a timeout.
  
  Delivering the timeout earlier will reduce the overall number of sockets opened at one time.
  
  Halving the time to 5 minute timeout reduces the total number of TIME_WAIT sockets resulting from IDLE connection from the Web Agents.

Additional Information

Multi-Processing Modules (MPMs)