SOI alerts are delayed: "internal connection pool has reached its maximum size"
search cancel

SOI alerts are delayed: "internal connection pool has reached its maximum size"

book

Article ID: 219081

calendar_today

Updated On:

Products

CA Service Operations Insight (SOI)

Issue/Introduction

In SOI, Alerts are arriving to the queue with a big delay. Therefore the Alerts are enriched after log time up to 1 hour.

On soimgr log this message was seen:

 

"The internal connection pool has reached its maximum size and no connection is currently available"

Environment

Release : 4.2 - CA Service Operations Insight Version 4.2.0.551.20201221

Component : SOI ALERT MANAGEMENT

Cause

The root cause of this issue appears to be AlarmGlobalUpdates queue, when jobs are queued up, alert processing is impacted. 

Resolution

Increase the value

<property name="connection.pool_size">20</property>"

(in CA\SOI\tomcat\lib\hibernate.cfg.xml)

to

<property name="connection.pool_size">50</property>

Additional Information

1 •   What is the parameter "connection.pool_size" for and could this have to do with the issue described?

Hibernate.cfg.xml file on SOI Manager uses JDBC connections to communicate with a database. The connection pool setting provides a connection pool for an application, by default it maintains 20 connections, this value can be adjusted based on the environment such as network connection / Memory / CPU etc.

2 • Why has SOI stopped alert queue processing without clear evidence of error or other issues in logs? We could only see increasing AlarmGlobalUpdates. How can we monitor this issue? (AlarmGlobalUpdates increasing). 

The root cause of this issue appears to be AlarmGlobalUpdates queue, when jobs are queued up, alert processing is impacted. 
You may need to purge older data from SAMStore DB, also start SOI Manager services in an order (see below KB), if all connectors started at a time, we may run into performance issues like AlarmGlobalUpdates queue stuck etc.

Procedure to start SOI Manager & Connector Services: 
https://knowledge.broadcom.com/external/article/135251/procedure-to-start-soi-manager-connecto.html

The best option is to monitor the Job queue status page (Manager Debug Page) at regular intervals or they may have to install / configure APM agent to monitor these queues.

Any update in the alert may trigger many update jobs which will be queued in AlarmGlobalUpdates job queue.

Just increasing AlarmGlobalUpdates jobqueue count is not a problem until some processing is happening.

In a running environment, it may happen and better is to wait and monitor the jobqueues till all jobs are processed which may take few mins/hrs depending on the job type and count in the queue.

If job queues are stuck, better is to restart SOI services. Usually, it happens on either Manager startup, Connector/connectors startup, alert storm from connectors, if manager is busy in some other processing.

About what logs are required, it depends upon the analysis. The analysis will start from Manager logs to the problematic connector logs if required. To keep the SOI DB and the environment healthy, regular purge of old data is a good option. Purging once in a week is fine but daily is not required. However there is no harm in doing this daily.