CA Release Automation - Release Operations Center (Nolio)CA Release Automation - DataManagement Server (Nolio)
Issue/Introduction
Problem: Intermediate database connection loss. Release jobs are hanging.
Environment
CA Release Automation 6.4.0
Cause
The cause was due to JDBC connection errors which were being experienced due to an invalid IP Address (1 of 2 that were assigned) assigned to the alias of the SQL Server's listener. The following errors were found in the nolio_dm_all.log while the jdbc connection errors were occurring:
2018-07-24 19:56:12,134 [http-nio-8080-exec-10] WARN (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:143) - SQL Error: 0, SQLState: 08S01 2018-07-24 19:56:12,135 [http-nio-8080-exec-10] ERROR (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:144) - The TCP/IP connection to the host <yourMsSqlServername>, port 1433 has failed. Error: "No route to host. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.". 2018-07-24 19:56:12,136 [http-nio-8080-exec-10] ERROR (com.nolio.releasecenter.controllers.TokenController:96) - Controller method error occurred. org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Could not open connection
Resolution
The solution for this issue was to remove the invalid IP address (1 of 2 addresses that were assigned) that was assigned to the alias being used for the SQL Server's listener. Once this has been done the services for the CA Release Automation manager should be stopped, remove the hibernate cache data (by deleting the files/folders found inside RAMgmtServerInstallDir\temp\*), and restart services on the CA Release Automation management server.
Additional Information
Some other errors observed while these errors were happening were http 500 errors while an agent was running the action "ROC - Get Artifact". This 500 error was the result because the agent in turns requests data from the management server which fails while querying the database for the data to return to the agent.