Bridge For Git taking long time to start up and fails with RC=0100

Products

Endevor

Issue/Introduction

Endevor Bridge For Git started task is intermittently taking a long time to start up and fails with RC=0100

Resolution

The started task log contained database timeout related errors:
*****
Caused by: java.sql.SQLTimeoutException: Login timeout exceeded.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.timeLogin(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.connect(Unknown Source)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561)
... 137 more
Caused by: ERROR XBDA0: Login timeout exceeded.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 150 more JVMJZBL1014I Waiting for non-daemon Java threads to finish before exiting... JVMJZBL2999I JZOS batch launcher elapsed time=427 seconds, cpu time=11.910000 seconds (zOS release 29) JVMJZBL1047W JZOS batch launcher completed with Java exception, return code=100

...

2025-07-10 06:07:22.522 ERROR 67174544 --- Ý main¨ com.zaxxer.hikari.pool.HikariPool : liquibase - Exception during
pool initialization.
0java.sql.SQLTimeoutException: Login timeout exceeded.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
...
Caused by: org.apache.derby.iapi.error.StandardException: Login timeout exceeded.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 150 common frames omitted
*****

It appeared that database connection timeout settings needed to be increased in application.yaml file per article: Bridge for Git database timeout when attempting restart
The same user had previously similar but this time they increased the liquibase connection-timeout value from 400000 ms to 5000000 ms (83 minutes) to eventually get the process to start which it did after 3840000 ms (64 minutes).
However the next day the startup took only 8 seconds so why did the long delay occur on the previous day?

Endevor Engineering advised that similar issues from other sites have been found to be due to CPU resources being reallocated to other more priority workloads.
In this case the user believed the CPU (GP, General Processor) and Speciality Engines (ZIIPs) were not under stress, but they had seen increased load/locking occurring in CF (Coupling Facility). Their Systems Team observed a lot of I/O and subsequently a lot of Coupling Facility usage.
Digging deeper, this XCF usage is due to z/FS file updates and presumably, to ensure integrity, the system communicates these OMVS file updates sysplex wide, using Coupling Facilities. Because the non-prod CF engines are shared between Test/Dev/OAT, this causes XCF contention, resulting in CPU overhead

To summarise, the performance issues at startup seemed to be related to sharing a z/FS file (BFGDATA) across the Sysplex, because this file has heavy I/O activity and shares the CF to control the locks. In larger systems where there are dedicated CFs there may not be any issue.
Endevor Engineering also added that using a file-based database like Apache Derby requires a lot of random accesses to the filesystem, so if having something like CF sitting there and trying to synchronize all the updates could explain the performance issue.