search cancel

Siteminder policy server failover mechanism to Oracle RDS Policy Store (Delay)

book

Article ID: 257567

calendar_today

Updated On:

Products

CA Single Sign On Agents (SiteMinder)

Issue/Introduction

Siteminder Policy Server Failover delay to Oracle Policy store (same applies for user Store) 

Use case:

- Policy store / key Store (same DB Instance) running on Oracle 19c 
- The Defined Data source for the Policy Store is using AWS load balancer   as the host for the DB which takes care of sending the traffic to a primary DB and in case of a failover sends to the backup DB
- when primary DB is taken down, The AWS load balancer then failed over to the secondary DB 
- From Siteminder side , it was observed that no failover or Failback messages reported in the smps log / trace for the Data sources (same data source in Failover mode) 
- Policy server seems to recover after 5 min or so.
- some of the transactions were successful and others were not 

Environment

Release : 12.8.x

Resolution

 

########### Troubleshooting and Analysis #############

#### Testing Results with out of the box Driver settings 

- Siteminder had the following Registry keys set 

LoginTimeout=        0xf;    REG_DWORD     --> 15 seconds 
QueryTimeout=       0x1e;    REG_DWORD     --> 30 seconds 
ConnectionTimeout=  0x41;    REG_DWORD     --> 65 seconds    

- From the Policy server trace and driver snoop logs (see end of the article on how to enable ODBC Driver snoop logs) , it was observed  that the delay is happening on the already established handles (connections to Store prior of DB Failover) were 3 Retry Attempts on the Already established handle which took 1 min and 30 seconds each to report the -1 before giving up finally and creating a new handle.

below is the Isolated traces from policy server and ODBC snoop showing the issue. The ODBC Driver is not honoring the 30 seconds querytimeout passed by Siteminder  

 

 ** From odbc trace , this was on SQLExecute where call started at 13:07:53 and did not return till 13:09 from Driver reported -1    which is exactly 1min and 30 seconds before Driver gave up 
    
    Line 120476: 2022-03-28 13:07:53.307364 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLExecute 
    Line 120478: 2022-03-28 13:07:53.307364 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 121812: 2022-03-28 13:09:23.382440 (Time elapsed in microseconds : 90075076) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLExecute  with return code -1 (SQL_ERROR)
    Line 121814: 2022-03-28 13:09:23.382440 (Time elapsed in microseconds : 90075076) ppid=29829:pid=29831:tid=140692793849     HSTMT               0x007ff58c001690
    
    ** Then we can see at [13:10:53.470] , the thread again reported a network issue 
    
    [03/28/2022][13:10:53.470][13:10:53][29831][140692793849600][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Error.]
    [03/28/2022][13:10:53.470][13:10:53][29831][140692793849600][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Server BUSY or State with Error, so Retrying]
    [03/28/2022][13:10:53.470][13:10:53][29831][140692793849600][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]
    
    IF we look at the ODBC trace , we can see that actually at 13:09:23  after the first -1 above , we can see a retry apparently for the Execution  which also took another 1 min 30 seconds to return - from driver at 13:10:53.470058
    
    Line 121880: 2022-03-28 13:09:23.383173 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLExecute 
    Line 121882: 2022-03-28 13:09:23.383173 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 132508: 2022-03-28 13:10:53.470058 (Time elapsed in microseconds : 90086885) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLExecute  with return code -1 (SQL_ERROR)
    Line 132510: 2022-03-28 13:10:53.470058 (Time elapsed in microseconds : 90086885) ppid=29829:pid=29831:tid=140692793849     HSTMT               0x007ff58c001690
    
    
    ** And yet Again another pause in our Trace before we see another Network Error reported at [13:12:23.539 . 
    
    
    [03/28/2022][13:12:23.539][13:12:23][29831][140692793849600][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Error.]
    [03/28/2022][13:12:23.539][13:12:23][29831][140692793849600][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Server BUSY or State with Error, so Retrying]
    [03/28/2022][13:12:23.539][13:12:23][29831][140692793849600][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]
    
    If you look at the ODBC trace , you can see that there was another SQLExecute that kicked in at 13:10:53 and one more time took 1min and 30 seconds to return the -1 from Driver side at [13:12:23.539]
    
    Line 132576: 2022-03-28 13:10:53.470717 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLExecute 
    Line 132578: 2022-03-28 13:10:53.470717 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 163128: 2022-03-28 13:12:23.538768 (Time elapsed in microseconds : 90068051) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

    
    ** also another pause in our Trace which is exactly the same thing as above due to another sqlExecute that took 1min and 30 seconds before driver reported -1
    
    
    [03/28/2022][13:13:53.631][13:13:53][29831][140692793849600][CSmDbODBC.cpp:1973][][][][][][][][][][][][][][][][][][][][][][LogMessage:INFO:[sm-log-00000] Execution time exceeded threshold. (CSmDbConnectionODBC::DoExecute, 360323, 5000, )]
    
    
    Line 163196: 2022-03-28 13:12:23.554194 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLExecute 
    Line 163198: 2022-03-28 13:12:23.554194 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 189560: 2022-03-28 13:13:53.631082 (Time elapsed in microseconds : 90076888) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLExecute  with return code -1 (SQL_ERROR)
    
    
    ** Over here , we can see Finally we were able to Inactivate the connection and then if you look at the same ODBC trace thread , we can see a new Connection being opened 
    
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbODBC.cpp:210][CSmDbConnectionODBC::MapResult][][][][][][][][][][][][][][][][][][][][][Invalid handle '140692592531088'.]
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CDb.cpp:285][CSmRecordset::DoSelect][][][][][][][][][][][][][Exception][Internal Error: DB Transport Error. Error code -4017 Native code -1][][][][][][][Finish processing SQL statement.]
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbODBC.cpp:4367][CSmDbConnectionODBC::DoClose][][][][][][][][][][][][][][][][][][][][][Invalid Statement handle '140692592531088'.]
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbODBC.cpp:210][CSmDbConnectionODBC::MapResult][][][][][][][][][][][][][][][][][][][][][Invalid handle '140692592531088'.]
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbODBC.cpp:3397][CSmDbConnectionODBC::AssignConnection][][][][][][][][][][][][][][][][][][][][][Invalid Statement handle '140692592531088'.]
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbConnection.cpp:241][CSmDbConnection::ReleaseReference][][][][][][][][][][][][][][][][][][][][][Release connection reference. ReferenceCount = '7']
    [03/28/2022][13:14:08.633][13:14:08][29831][140692793849600][CSmDbConnection.cpp:200][CSmDbConnection::MakeInactive][][][][][][][][][][][][][][][][][][][][][Inactivate connection.]
    
    
    here is the snippet from ODBC 
    
    Line 197492: 2022-03-28 13:14:08.634537 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLAllocHandle 
    Line 197494: 2022-03-28 13:14:08.634537 ppid=29829:pid=29831:tid=140692793849600                                            SQLSMALLINT                  3 <SQL_HANDLE_STMT>
    Line 197496: 2022-03-28 13:14:08.634537 ppid=29829:pid=29831:tid=140692793849600                                            SQLHANDLE           0x007ff5cc000ef0
    Line 197498: 2022-03-28 13:14:08.634537 ppid=29829:pid=29831:tid=140692793849600                                            SQLHANDLE *         0x007ff58c0015d0
    Line 197500: 2022-03-28 13:14:08.634609 (Time elapsed in microseconds :       72) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLAllocHandle  with return code 0 (SQL_SUCCESS)

    and then Finally the SQLExute succeeded 
    
    Line 197548: 2022-03-28 13:14:08.634813 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLPrepareW 
    Line 197550: 2022-03-28 13:14:08.634813 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 197552: 2022-03-28 13:14:08.634813 ppid=29829:pid=29831:tid=140692793849600                                            WCHAR *             0x007ff58c00ee58 [4294967293] "SELECT smagentcommandoid, smtimestamp, smcommand, smdata  FROM smagentcommand4 WHERE smtimestamp >= 1648412639\x00"
    Line 197554: 2022-03-28 13:14:08.634813 ppid=29829:pid=29831:tid=140692793849600                                            SDWORD                    -3
    Line 197556: 2022-03-28 13:14:08.634886 (Time elapsed in microseconds :       73) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLPrepareW  with return code 0 (SQL_SUCCESS)
    Line 197558: 2022-03-28 13:14:08.634886 (Time elapsed in microseconds :       73) ppid=29829:pid=29831:tid=140692793849     HSTMT               0x007ff58c001690
    Line 197560: 2022-03-28 13:14:08.634886 (Time elapsed in microseconds :       73) ppid=29829:pid=29831:tid=140692793849     WCHAR *             0x007ff58c00ee58 [4294967293] "SELECT smagentcommandoid, smtimestamp, smcommand, smdata  FROM smagentcommand4 WHERE smtimestamp >= 1648412639\x00"
    Line 197562: 2022-03-28 13:14:08.634886 (Time elapsed in microseconds :       73) ppid=29829:pid=29831:tid=140692793849     SDWORD                    -3
    Line 197564: 2022-03-28 13:14:08.634927 ppid=29829:pid=29831:tid=140692793849600                                            ENTER SQLExecute 
    Line 197566: 2022-03-28 13:14:08.634927 ppid=29829:pid=29831:tid=140692793849600                                            HSTMT               0x007ff58c001690
    Line 197568: 2022-03-28 13:14:08.643590 (Time elapsed in microseconds :     8663) ppid=29829:pid=29831:tid=140692793849     EXIT  SQLExecute  with return code 0 (SQL_SUCCESS)
    

#### Testing Results with InterruptTimeout  recommended by Progress  DEV Team

- With the Above and After consulting with Progress Team (DataDirect Driver used by Siteminder), it was recommended to use the InterruptTimeout  in the Data Source to see if it helps .

The default for query timeout is 0 which means that the query does not time out, but the driver responds to the SQL_ATTR_QUERY_TIMEOUT attribute which is set to 30 seconds.  So, QueryTimeout doesn't need to be added to the ODBC data source.

Using the positive value for InterruptTimeout causes the driver to stop waiting for the cancel acknowledgement and return to the application.  In the customer's situation, the query timeout is triggered but the database is not available to return a cancel acknowledgement to the driver and that is most likely why it takes so long to respond to the failover.

Using the negative value for InterruptTimeout causes the driver to stop waiting for the cancel acknowledgement and kill the socket, which drops the connection.  

InterruptTimeout set to a small positive value, say 5 seconds:
 
    QueryTimeout                30
    InterruptTimeout             5
    Should return to app in 35 seconds  
 
If that doesn't help, have them set it to a negative value:
 
InterruptTimeout set to a small negative value, say -5 seconds: 
 
    QueryTimeout                30
    InterruptTimeout             -5
    Socket will be killed in 35 seconds (the connection will be gone)

Same test was performed and an improvement was observed in the Failover where this time it was taking 40 seconds this time for each Retry instead of the 90 seconds observed previously, so the InterruptTimeout added and set to 5 in the Data Srource.

However , we should be expecting to have 30 Query timeout + 5 seconds InterruptTimeout = 35 seconds for each retry and yet the results where showing 40 seconds .

05/25/2022][10:33:45.047][10:33:45][1021][139814171031296][CDb.cpp:312][CSmRecordset::Execute][][][][][][][][][][][][][][][][][][][][][Start processing SQL statement.]
[05/25/2022][10:34:25.086][10:34:25][1021][139814171031296][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Error.]
[05/25/2022][10:34:25.086][10:34:25][1021][139814171031296][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Server BUSY or State with Error, so Retrying]
[05/25/2022][10:34:25.086][10:34:25][1021][139814171031296][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]
[05/25/2022][10:35:05.124][10:35:05][1021][139814171031296][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Error.]
[05/25/2022][10:35:05.124][10:35:05][1021][139814171031296][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Server BUSY or State with Error, so Retrying]
[05/25/2022][10:35:45.152][10:35:45][1021][139814171031296][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Error.]
[05/25/2022][10:35:45.152][10:35:45][1021][139814171031296][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][HY000][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Network Operation Timed Out.][][][][][][][SQL Server BUSY or State with Error, so Retrying]

2022-05-25 10:33:45.047961 ppid=1019:pid=1021:tid=139814171031296                                              ENTER SQLExecute 
2022-05-25 10:34:25.086375 (Time elapsed in microseconds : 40038414) ppid=1019:pid=1021:tid=13981417103129     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

2022-05-25 10:34:25.086741 ppid=1019:pid=1021:tid=139814171031296                                              ENTER SQLExecute 
2022-05-25 10:35:05.124140 (Time elapsed in microseconds : 40037399) ppid=1019:pid=1021:tid=13981417103129     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

2022-05-25 10:35:05.124572 ppid=1019:pid=1021:tid=139814171031296                                              ENTER SQLExecute 
2022-05-25 10:35:45.151727 (Time elapsed in microseconds : 40027155) ppid=1019:pid=1021:tid=13981417103129     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

2022-05-25 10:35:45.167326 ppid=1019:pid=1021:tid=139814171031296                                              ENTER SQLExecute 
2022-05-25 10:36:25.203728 (Time elapsed in microseconds : 40036402) ppid=1019:pid=1021:tid=13981417103129     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

 

##### Feedback from Progress DEV team on the additional 5 seconds observed after adding the InterruptTimeout  

Progress team reproduced the issue in their Lab and determined that the extra 5 seconds taken by InterruptTimeout is a bug that would be addressed in Progress Defect XDBC-37372.

a DEV fix for the Driver was provided where the driver will now return in QueryTimeout+InterruptTimeout with no extra time taken

 

##### Testing the ODBC Driver DEV fix provided by Progress 

After implementing the DEV fix and testing the same failover scenario ,f rom the logs listed below , you can see that before the Driver was attempting 4 times with each time taking 40 seconds (query timeout was set to 30 and the Interrupt timeout was set to 5 so our assumption was 30+ 5 = 35 but it was taking 40 . The additional 5 seconds is the reason  Progress DEV team gave us the patch that is supposed to remove the 5 seconds delay) 

so we have 4 retries on the query with each taking 40 seconds = 40 + 40 + 40+ 40 = 160 seconds  so roughly around 3 min 

however , if I look into the new logs , the Transaction is failing in around  13 seconds  

If you noticed , the driver on the 3 retries is now taking microseconds to complete .

 

After Driver Update

------------------------------------- 
 

[09/28/2022][19:22:06.291][19:22:06][14489][140100742665984][CDb.cpp:221][CSmRecordset::DoSelect][][][][][][][][][][][][][][][][][][][][][Start processing SQL statement.]
[09/28/2022][19:22:06.291][19:22:06][14489][140100742665984][CDb.cpp:312][CSmRecordset::Execute][][][][][][][][][][][][][][][][][][][][][Start processing SQL statement.]
[09/28/2022][19:22:19.193][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Connection Dead.][][][][][][][SQL Error.]
[09/28/2022][19:22:19.193][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]Connection Dead.][][][][][][][SQL Server BUSY or State with Error, so Retrying]

[09/28/2022][19:22:19.193][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]
[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]ORA-03114: not connected to ORACLE][][][][][][][SQL Error.]
[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]ORA-03114: not connected to ORACLE][][][][][][][SQL Server BUSY or State with Error, so Retrying]

[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]
[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1472][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]ORA-03114: not connected to ORACLE][][][][][][][SQL Error.]
[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1521][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][08S01][-1][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]ORA-03114: not connected to ORACLE][][][][][][][SQL Server BUSY or State with Error, so Retrying]
[09/28/2022][19:22:19.194][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1555][CSmDbConnectionODBC::CheckForError][][][][][][][][][][][][][][][][][][][][][SQL Still Executing (Not expected if using syncronous connections.]

[09/28/2022][19:22:19.209][19:22:19][14489][140100742665984][CSmDbODBC.cpp:1973][][][][][][][][][][][][][][][][][][][][][][LogMessage:INFO:[sm-log-00000] Execution time exceeded threshold. (CSmDbConnectionODBC::DoExecute, 12918, 5000, agent= client= server= resource= action= user=)]
[09/28/2022][19:22:19.209][19:22:19][14489][140100742665984][CSmDbODBC.cpp:362][CSmDbConnectionODBC::MapResult][][][][][][][][][][][][-1][][[CA SSO][ODBC Oracle Wire Protocol driver][Oracle]ORA-03114: not connected to ORACLE][][][][][][][ODBC Error: State = 08S01 Internal Code = -3114 - s - MappedResult:-4017]
[09/28/2022][19:22:19.209][19:22:19][14489][140100742665984][CSmDbUtilities.cpp:806][CSmDbMonitoredClass::SetState][][][][][][][][][][][][][][][][][][][][][Connection SiteMinder Data Source: Changing object state 'Active' to state 'Hung'.]
[09/28/2022][19:22:19.209][19:22:19][14489][140100742665984][CSmDbODBC.cpp:567][CSmDbConnectionODBC::CancelSQLCall][][][][][][][][][][][][][][][][][][][][][Cancel executed, Connection marked as hung.]
[09/28/2022][19:22:19.215][19:22:19][14489][140100742665984][CDb.cpp:285][CSmRecordset::DoSelect][][][][][][][][][][][][][Exception][Internal Error: DB Transport Error.  Code is -4017 (DBMSG: <<<State = 08S01 Internal Code = -3114 - s>>>)][][][][][][][Finish processing SQL statement.]
 
 
 
 Line 293434: 2022-09-28 19:22:06.291276 ppid=14486:pid=14489:tid=140100742665984                                         ENTER SQLExecute 
 Line 293436: 2022-09-28 19:22:06.291276 ppid=14486:pid=14489:tid=140100742665984                                         HSTMT               0x007f6ba408f230
 Line 293730: 2022-09-28 19:22:19.193357 (Time elapsed in microseconds : 12902081) ppid=14486:pid=14489:tid=140100742665  EXIT  SQLExecute  with return code -1 (SQL_ERROR)

 
 
 Line 293798: 2022-09-28 19:22:19.193746 ppid=14486:pid=14489:tid=140100742665984                                         ENTER SQLExecute 
 Line 293800: 2022-09-28 19:22:19.193746 ppid=14486:pid=14489:tid=140100742665984                                         HSTMT               0x007f6ba408f230
 Line 293802: 2022-09-28 19:22:19.193780 (Time elapsed in microseconds :       34) ppid=14486:pid=14489:tid=140100742665  EXIT  SQLExecute  with return code -1 (SQL_ERROR)
 
 
 Line 293870: 2022-09-28 19:22:19.194084 ppid=14486:pid=14489:tid=140100742665984                                         ENTER SQLExecute 
 Line 293872: 2022-09-28 19:22:19.194084 ppid=14486:pid=14489:tid=140100742665984                                         HSTMT               0x007f6ba408f230
 Line 293874: 2022-09-28 19:22:19.194108 (Time elapsed in microseconds :       24) ppid=14486:pid=14489:tid=140100742665  EXIT  SQLExecute  with return code -1 (SQL_ERROR)
 
 
 Line 293942: 2022-09-28 19:22:19.209451 ppid=14486:pid=14489:tid=140100742665984                                         ENTER SQLExecute 
 Line 293944: 2022-09-28 19:22:19.209451 ppid=14486:pid=14489:tid=140100742665984                                         HSTMT               0x007f6ba408f230
 Line 293946: 2022-09-28 19:22:19.209515 (Time elapsed in microseconds :       64) ppid=14486:pid=14489:tid=140100742665  EXIT  SQLExecute  with return code -1 (SQL_ERROR)

 

##### Progress team Feedback on the test results of the DEV fix 

We saw a huge improvement in SQL Query failover during the DB Failover after implementing the patch provided by Progress team.

instead of 160 seconds To determine connection is bad (including retries) , now it is taking around 13 seconds .

From the Driver trace  30 seconds in the Query timeout 

ENTER SQLSetStmtAttr 
SQLHSTMT            0x007f6ba408f230
SQLINTEGER                   0 <SQL_ATTR_QUERY_TIMEOUT>
SQLPOINTER                30

before , the driver was taking 40 seconds to mark the SQL Execute Error 

 

2022-05-25 10:33:45.047961 ppid=1019:pid=1021:tid=139814171031296                                              ENTER SQLExecute 
2022-05-25 10:34:25.086375 (Time elapsed in microseconds : 40038414) ppid=1019:pid=1021:tid=13981417103129     EXIT  SQLExecute  with return code -1 (SQL_ERROR)

And then perform 3 other retries each also taking 40 second 

After the Fix , I can see that the SQL Execute is taking 12 seconds to determine the connection is bad and then the other 3 retries are roughly taking 40 microseconds each 

 Line 293434: 2022-09-28 19:22:06.291276 ppid=14486:pid=14489:tid=140100742665984                                         ENTER SQLExecute 
 Line 293436: 2022-09-28 19:22:06.291276 ppid=14486:pid=14489:tid=140100742665984                                         HSTMT               0x007f6ba408f230
 Line 293730: 2022-09-28 19:22:19.193357 (Time elapsed in microseconds : 12902081) ppid=14486:pid=14489:tid=140100742665  EXIT  SQLExecute  with return code -1 (SQL_ERROR)

So you can see there is a huge difference in performance and marking connection as bad .

Couple of questions came out since expectations was still we are going to have 30+5 seconds as explained above after the fix  but it is not the case , it is much faster now.

We sent the below questions to Progress DEV team 

1) If the Query timeout is 30 seconds as seen passed from Siteminder code to Driver , why the Driver is returning Error at the 13 seconds mark ? does it now not wait for the 30 seconds to pass and  rather now report issue as soon as returned by the Database or tcp layer ? 

2) same for the Retries , before it used to take 40 seconds also for each retry , now it is taking around 40 microsecond to report -1 connect error , is it based on the tcp response and no longer wait for the  query timeout ? 

and the Answer was as follows 

Development has responded that it is expected for the driver to return "Connection dead" instead of "Network Operation Timed Out" after the fix. 
As part of the fix, the driver now determines whether the connection is dead during the current ODBC call instead of returning the timeout error and then returning "Connection dead" error after a subsequent ODBC call. 
This means that the driver returns as soon as it determines that the connection is dead, which is why it returns sooner.
 
The drivers that can use InterruptTimeout are DB2, Oracle and Sybase ASE and Sybase IQ.

 

##### Conclusion 

Policy server 12.8 SP7 and lower are impacted by the Failover delay.

Please reach out to Broadcom Siteminder support so the updated DataDrirect Driver patch can be provided based on Defect DE531927 .

The same DataDirect fix will be included in the next Siteminder official release

 

 

Additional Information

####### Enable DataDirect Driver logs #########

Modify your system_odbc.ini file and go to the end and set the tacing to from Trace=0 to Trace=1  which will enable the tracing 

Example 

[ODBC]
Trace=1 
TraceFile=/opt/CA/siteminder/db/odbctrace.out
TraceDll=/opt/CA/siteminder/odbc/lib/NStrc28.so
InstallDir=/opt/CA/siteminder/odbc
DBCTraceMaxFileSize=204,800
ODBCTraceMaxNumFiles=50    


To turn on the internal snoop set the following connection options in the odbc.ini file

EnablePacketLogging = 1
PacketLoggingFlush=1
PacketLoggingFilePrefix=/tmp/DB2_snoop_