12.9 SiteMinder Access Gateway becomes unresponsive under certain condition
search cancel

12.9 SiteMinder Access Gateway becomes unresponsive under certain condition

book

Article ID: 421073

calendar_today

Updated On:

Products

SITEMINDER

Issue/Introduction

When one of policy servers goes down, it may trigger a race condition where 12.9 SiteMinder Access Gateway becomes unresponsive.

When this happens, 12.9 gateway processes (httpd and java) are running, but gateway agent is not sending request to the rest of active policy servers that were UP.

Environment variable SM_ENABLE_TCP_KEEPALIVE is enabled.

The only way to recover from this situation is by restarting access gateway.

Environment

OS: ANY

Access Gateway: 12.9

Cause

The race condition occurs when two threads are closing the same socket connection (file descriptor) to the same policy server.

The socket was open/closed in a very short window by two different threads.

Since a socket [FD#137] was closed by a different thread, it becomes unavailable to the original thread, which originally opened it.

Resolution

A code fix is required from Broadcom engineering. Or upgrade 12.9 gateway to a later patched release.

web agent trace prior code fix:

[mm/dd/yyyy][hh:mm:ss.517][118346][140705942992448][][SmClient.cpp:3246][Revive][Entered Revive()]
[mm/dd/yyyy][hh:mm:ss.517][118346][140705942992448][][SmClient.cpp:1902][CSmServerHandle::StateTransition][Server State transition from INACTIVE to INTER]
[mm/dd/yyyy][hh:mm:ss.517][118346][140705942992448][][SmClient.cpp:867][CSmServerHandle::InitConnections][Now creating connections]
[mm/dd/yyyy][hh:mm:ss.517][118346][140705942992448][][SmClient.cpp:1231][CreateConnection][Create Connection]
[mm/dd/yyyy][hh:mm:ss.517][118346][140705942992448][][SmClient.cpp:1287][CreateConnection][SM_ENABLE_TCP_KEEPALIVE is enabled.]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:3631][CheckWritableConnections][ERROR: getsockopt returned that 0th socket [FD#137] is not reachable]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:3747][CheckConnectionStatus][select returned nSelectStatus=1]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:3829][SetBlockingMode][Failed to set non blockig mode.]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:1440][DoHandShake][Performing HandShake]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmAgentConfigFile.cpp:1190][CSmConfigFileManager::GetSharedSecretFromMemory][Successfully got the shared secret from memory]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:1462][DoHandShake][Handshake error while creating connection.]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:659][CSmConnection::CloseConnection][Failed to setsockopt, error(9).]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:667][CSmConnection::CloseConnection][Failed to closesocket, error(9).]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:1170][ReleaseConnection][Release Connection]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:994][CSmServerHandle::CreateConnections][Failed to create connection.]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:3339][Revive][Failed to creat connections, transitioning to Inactive State]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:1902][CSmServerHandle::StateTransition][Server State transition from INTER to INACTIVE]
[mm/dd/yyyy][hh:mm:ss.533][118346][140705942992448][][SmClient.cpp:3348][Revive ][Exiting Revive activeCount=7]
[mm/dd/yyyy][hh:mm:ss.547][118346][140705942992448][][SmAgentAPI.cpp:687][ConnectionService][Request handler statistics]

web agent trace after code fix:

[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:605][PrintStats][Server wait queue statistics]
[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:3254][Revive][Entered Revive()]
[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:1912][CSmServerHandle::StateTransition][Server State transition from INACTIVE to INTER]
[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:875][CSmServerHandle::InitConnections][Now creating connections]
[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:1240][CreateConnection][Create Connection]
[mm/dd/yyyy][hh:mm:ss.461][3213499][140265400563264][][SmClient.cpp:1297][CreateConnection][SM_ENABLE_TCP_KEEPALIVE is enabled.]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:3646][CheckWritableConnections][ERROR: getsockopt returned that 0th socket [FD#135] has an error: 111]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:3783][CheckConnectionStatus][select returned nSelectStatus=-1]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:3800][CheckConnectionStatus][SOCKET_ERROR]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:1362][CreateConnection][No server is ready(nMaxFd = 0).Closing the socket connection. returning err 1]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:1427][CreateConnection][Connect error while creating connection]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:1179][ReleaseConnection][Release Connection]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:1003][CSmServerHandle::CreateConnections][Failed to create connection.]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:3347][Revive][Failed to creat connections, transitioning to Inactive State]
[mm/dd/yyyy][hh:mm:ss.471][3213499][140265400563264][][SmClient.cpp:1912][CSmServerHandle::StateTransition][Server State transition from INTER to INACTIVE]

Note: Error in web agent trace is normal and expected when the policy server is NOT available,  even with fix applied.