search cancel

Need details on error - All database pools are inactive

book

Article ID: 229617

calendar_today

Updated On:

Products

CA Strong Authentication CA Risk Authentication

Issue/Introduction

We saw some exceptions in our Arcot logs and needs clarification on the same as per logs it looks like there was some issue with the database but as per our DBAs everything was running fine on the DB side. It would be great if you can help us when this class is called along with the getLockedDBConnection method.

Arcot Logs:  (Timings are in PST)

11/21/21 01:01:09.805 INFO  TXN_WS       00041214 826493349 - ArDBPoolManagerImpl::getLockedDBConnection: Pool marked as active [primary] is not alive *now*. Will check another pool
11/21/21 01:01:09.805 WARN  TXN_WS       00041214 826493349 - ArDBM::Caught ArcotException in _DbOp!. err : [Arcot Exception,Error: All database pools are inactive]
11/21/21 01:01:09.805 WARN  TXN_WS       00041214 826493349 - Failed to get time from DB (System time will be taken)
11/21/21 01:01:09.805 INFO  TXN_WS       00041214 826493349 - Txn-Begin : TxnID=826493349 | ClientTxnID=[] | Protocol=4 (TXN_WS) | ReqSize=826 | TST=2021-11-21 09:01:09:805 (SYS)
11/21/21 01:01:09.806 WARN  TXN_WS       00041214 826493349 - All the Databases are down!.Aborting the transaction and setting response code as INTERNAL_ERROR
11/21/21 01:01:09.806 INFO  TXN_WS       00041214 826493349 - Caught ArWFTxnAbortException in WebFortFrameworkImpl::process while attempting to do Self check prior to transaction processing.
11/21/21 01:01:09.806 INFO  TXN_WS       00041214 826493349 - ArWFTxnAbortException.  : Response Code: [1000] Reason Code: [2000] Detail: [All the Databases are down - detected by self check prior to transaction processing]
11/21/21 01:01:09.806 INFO  TXN_WS       00041214 826493349 - Txn-End : TxnID=826493349 | ClientTxnID=[] | Processor=21 (USERMGMT) | Operation=2003 (ISSUANCE_USER_FETCH) | Response=1000 (INTERNAL_ERROR) | Reason=2000 (ALL_DB_DOWN) | RespSize=2887 | Time=3 | DBT=0 | NQ=1 | ExtEvents={ NONE } | AddInfo=[NONE] | LTB=01098 | LNL=0005/0005 | LML=225

 

 

Environment

Release : 9.1

Component : AuthMinder(Arcot WebFort)

Resolution

How our product works is that once the Driver ( Progress data direct) detects any problem ,it returns the DB error code and if that error exists in our database ( ARWFDBERRORCODES), we will try to do a failover and in that process all connection made to the Primary pool should return back to the connection pool and then we will try to do the failover.

If there is Failover DB defined then we will make connection to the backup DB but also keep pinging the primary DB and once that is available the connections will again be made to that Primary DB.

Product recovers well once the is responding back to the requests fine, the only way it may get delayed if the connections are not released back to the pool and that can be the reason of long issue.

Here is a KB article from Progress datadirect about the cause and some resolution steps.

https://knowledgebase.progress.com/articles/Article/000029398

Please go through this and SQL_ATTR_QUERY_TIMEOUT namesake is available in odbc.ini file and it is QueryTimeout and you can put 10 seconds there.

The other place you need to configure is from Master Admin console -> Services and server configurations -> Strong Authentication configuration -> Instance Management -> Database Configurations and specify 10 seconds for QueryTimeout there as well.