When running a Web Agent, this one went offline during a Policy Server request. This made an outage on the Policy Server.
To illustrate, this happened during a network outage or due to a network component failure.
Consequently, the Web Agent can’t notify the Policy Server of the communication failure, and the Policy Server continues to wait for the Web Agent data. With multiple requests from one or more Web Agents being lost in this manner, the Policy Server can become unresponsive because the worker threads handling the requests are not released. The symptoms have been seen in the Policy Server logs in the form of failed authentications, authorizations, or, also, as increasing connection queues.
Also, as there was a firewall between the Web Agent and the Policy Server, the agent returns a 500 error when accessing a page.
Policy Server all supported versions;
Web Agents since 12.51
In R6SP6, R12 SP3, R12.5x, R12.6-7-8, and above, you can create and enable the SiteMinder Enable TCP Keep Alive, SM_ENABLE_TCP_KEEPALIVE environment variable. This variable allows the Policy Server to send KeepAlive packets to what appears to the Policy Server as idle Web Agent connections (1)(2).
The initial wait period and the frequency or interval at which the Server sends the packets are based on OS–specific, configurable TCP/IP parameters:
For more information about configuring TCP/IP parameters, refer to the OS–specific documentation.
To configure the Policy Server to send KeepAlive packets to idle Web Agent connections, log into the Policy Server host system and do one of the following:
Note: The value must be 0 (disabled) or 1 (enabled). If a value other than 0 or 1 is configured, the environment variable is disabled. If the environment variable is disabled, the Policy Server does not send KeepAlive packets to idle Web Agent connections.
Subsequently, in 6SP6CR8 and R12SP3CR8, another related fix was introduced to improve the connection management mechanism further. If a Policy Server thread hangs in TCP recv(), it doesn't respond to requests. This is because a thread, that has taken a read lock, is waiting on recv() and another thread waiting on the write lock. Since the write lock request is pending, all other threads waiting for read lock won't be granted access. This situation gets resolved when recv() call returns and Policy Server recovers.
Also, R12.51 Web Agent Release Notes mentions "Enable KeepAlives When Agents and Policy Servers are separated by a Firewall." When there’s a firewall between the agent and Policy Server, and the agent returns a 500 error when accessing a page, set SM_ENABLE_TCP_KEEPALIVE on the agent by following the above steps for Windows and Unix operating systems.
In addition, since R12.52, this variable is recommended to be set at the locations for an Application Server Agent (ASA), the Administrative UI, or a custom agent created by the SDK.