search cancel

MSSQL DB Connection Errors & Timeouts

book

Article ID: 201484

calendar_today

Updated On:

Products

CA Release Automation - Release Operations Center (Nolio) CA Release Automation - DataManagement Server (Nolio)

Issue/Introduction

After recently migrating from one SQL version to another we are seeing various errors in the nolio_dm_all.log files. In addition to seeing these errors we found that deployments would sometimes hang for a long time. We have tried restarting Nolio services on the managemnet server and and clearing the hibernate cache to resolve the issue. Yet the messages and behaviors persist. 

Messages include:

  • Unexpected error on server [Could not open JPA EntityManager for transaction; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Could not open connection]. Please refer to server logs with timestamp [2020-10-12 11:00:55].
    Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host <db hostname>, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
  • 2020-10-10 15:16:40,334 [periodicTasksMasterMonitor-1] WARN  (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:143) - SQL Error: 0, SQLState: 08S01
  • 2020-10-10 15:16:40,335 [periodicTasksMasterMonitor-1] ERROR (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:144) - The TCP/IP connection to the host <db hostname>, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".

Cause

This can happen due to the maxTotal db connections being reached. It can also be related to other environmental related jdbc/mssql connection problems. 

To check if the number of active connections is close to, or at, the current maxTotal value, please access the JMX -> noliocenter:type=DataSource. Then, in the Attributes section of this page, take note of your maxTotal and NumActive values. 

JMX on the NAC, by default, is accessible via: http://mgmtservername:20203

Environment

Release : 6.6

Component : CA RELEASE AUTOMATION RELEASE OPERATIONS CENTER

Resolution

After making the following changes the errors outlined in the Issue/Introduction section did not reoccur:

  • Increase the maxTotal db setting from 50 to 100 in the webapps/datamanagement/WEB-INF/wrapperContext.xml.

 

Important:

The following update can also be considered if you're using MSSQL and using an AlwaysOn Availability Group. If you're not, or if you're not sure, then do not make this modification. 

  • Discuss with your DBAs if "multiSubnetFailure=true" should be added to the database.jdbc.url. This is specified in the webapps/datamanagement/WEB-INF/database.properties.

 

The "multiSubnetFailure=true" is a setting that can/should only be attempted if:

  1. You are using an MSSQL AlwaysOn Availability Group; and
  2. The webapps/datamanagement/WEB-INF/distributed.properties is configured to point to the AlwaysOn Availability Group server/listener. 

For more information about the MSSQL AlwaysOn, Availability Groups and the multiSubnetFailover option please consult appropriate Microsoft documentation. The following URLs may offer a starting point:

  • https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-ver15
  • https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/always-on-availability-groups-sql-server?view=sql-server-ver15
  • https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/sqlclient-support-for-high-availability-disaster-recovery

 

Note: 

The changes above did not resolve the hung deployment problems. This behavior is often associated with MSSQL server resource utilization exhaustion (ex: Memory and/or CPU). Please consult with your DBA to investigate the performance/utilization of the MSSQL Server. 

 

Additional Information

Regarding database connections, you can see how many connections the management server is making to the database using the management server's jmx console. To do this:

  • Open the JMX console: http://nac_server_hostname:20203
  • Find and click on the MBean for: noliocenter:type=DataSource
  • In the list of Attributes there is a "NumActive". 
  • In the list of Attributes there is also a "maxTotal" which represents the number set via the wrapperContext.xml file. 

 

If tracing is necessary then please review the following KB article: NAC & SQLException Tracing