Credhub instances lose connection with DB
search cancel

Credhub instances lose connection with DB

book

Article ID: 298347

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

There is a bug introduced in CredHub 2.9.7 that can leave TAS Credhub instances inoperative despite the fact the processes are running. 

Credhub VM logs will show something like
2022-02-21T11:32:02.173Z [https-jsse-nio-8844-exec-15] .... ERROR --- DefaultExceptionHandler: An application error occurred. Please contact your CredHub administrator. org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms. at org.springframework.jdbc.dataso

This can be noticed first in other components, such as applications crashing with following error
[ERR] Unable to interpolate credhub refs: Unable to interpolate credhub references: An application error occurred. Please contact your CredHub administrator.

or users not being able to deploy or query  config-server anymore, seeing the following error in backing app
2022-01-31 11:24:32.182 ERROR 31 --- [ctor-http-nio-1] gCreateServiceInstanceAppBindingWorkflow : Error storing binding credentials with name '/c/p.spring-cloud-services-scs-service-broker/aaabbbcccdddd


Environment

Product Version: 2.11

Resolution

A permanent fix will be released in Apr 15. Until then, we have 2 possible workarounds.

The immediate workaround is to restart the credhub process in Credhub VM's with "monit restart credhub".

We have come up with a more permanent one which is a script that will monitor the credhub.log and restart credhub when it detects the connection timeout issue. This might help reduce the pressure on the operators. These are the steps.

  1. Install the script in /root/credhub-32-workaround.sh on the credhub vm

  2. execute script

    • nohup ./credhub-32-workaround.sh &
  3. Verify its running ok  with “ps -o pid,command” and/or running "cat" agains script log

    1. credhub/f14d9b9b-d7d1-4706-9692-df6bde94c7dc:~# ps -o pid,command
        PID COMMAND
        717 sudo su -
        718 su -
        719 -su
        812 /bin/bash ./credhub-workaround.sh
       1286 tail -n 0 -f /var/vcap/sys/log/credhub/credhub.log
       1287 /bin/bash ./credhub-workaround.sh
       1541 ps -o pid,command
    2. cat /var/vcap/sys/log/credhub/credhub-32-workaround.log


This is the script code.

 
#!/bin/bash


## setuplog files
LOG_FILE=/var/vcap/sys/log/credhub/credhub-32-workaround.log


function monitor_log () {
    tail -n 0 -f /var/vcap/sys/log/credhub/credhub.log | while read line
    do
        echo $line | egrep "Connection is not available, request timed out after 30000ms"
        if [ $? -eq 0 ]
        then
            echo $(date) detected log line >> $LOG_FILE
            echo $(date) $line >> $LOG_FILE
            echo $(date) restarting credhub >> $LOG_FILE
            /var/vcap/bosh/bin/monit restart credhub
            break
        fi
    done
}

while true
do
    echo $(date) starting monitor loop >> $LOG_FILE
    monitor_log
    echo $(date) monitor loop exited sleeping before restarting >> $LOG_FILE
    sleep 300
done
 
  Example log file after injected errors in to credhub.log

 

Tue Mar 29 16:19:38 UTC 2022 starting monitor loop
Tue Mar 29 16:21:59 UTC 2022 detected log line
Tue Mar 29 16:21:59 UTC 2022 org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
Tue Mar 29 16:21:59 UTC 2022 restarting credhub
Tue Mar 29 16:22:04 UTC 2022 monitor loop exited sleeping before restarting
Tue Mar 29 16:27:04 UTC 2022 starting monitor loop
Tue Mar 29 16:27:42 UTC 2022 detected log line
Tue Mar 29 16:27:42 UTC 2022 org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
Tue Mar 29 16:27:42 UTC 2022 restarting credhub
Tue Mar 29 16:27:46 UTC 2022 monitor loop exited sleeping before restarting