Tanzu Hub Installation/Upgrade fails during the step: "Running errand Installing and configuring packages for Tanzu Hub."
When checking the cluster status, you may notice that postgresql-init-job is stuck in the Running state instead of Completed. Similarly, the corresponding pod for this job will be stuck in Running instead of reaching a Completed state.
Checking the pod logs indicates that it is failing to reach the PostgreSQL server and is retrying in a loop, as shown below:
Waiting for PostgreSQL to be ready...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
The init script run by this job is mounted as a ConfigMap, as shown in the snippet below.
spec:
containers:
- command:
- /bin/bash
- /scripts/db-init-script.sh
volumeMounts:
- mountPath: /scripts
name: init-script
readOnly: true
volumes:
- configMap:
defaultMode: 420
name: postgresql-init-script-ver-1
name: init-scriptNote: The specification has been truncated and updated to include only the relevant lines. Below is the snippet from the initialization script created using this ConfigMap.
db-init-script.sh: |
#!/bin/bash
shopt -s expand_aliases
set -e
alias psql="/opt/vmware/postgres/15/bin/psql"
alias pg_isready="/opt/vmware/postgres/15/bin/pg_isready"
echo "Waiting for PostgreSQL to be ready..."
until pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DB; do
echo "PostgreSQL not ready yet, waiting 5 seconds..."
sleep 5
done
echo "PostgreSQL is ready, running init script..."The failure is occurring on the following line:
pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DBBy default, the timeout for "pg_isready" to check availability is 3 seconds. If environment-related issues such as network latency or disk I/O cause the connection to the PostgreSQL server to take longer than 3 seconds, this job will continuously fail.
As a workaround, we can increase this timeout to 5 or 10 seconds using the following steps:
1) Pause the postgresql and sm package using the commands below:
kctrl package installed pause -i postgresql -n tanzusm --yes
kctrl package installed pause -i sm -n tanzusm --yes2) Edit the configmap "postgresql-migration-scripts-ver-1" by increasing the timeout to 5 seconds.
pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DB -t 5
3) Delete the pod postgresql-init-job-xxxxxx so the job creates a new one.
Both the Pods and Job should now transition to the Completed status, after which you can resume the reconciliation of both packages as follows:
kctrl package installed kick -i postgresql -n tanzusm --yes
kctrl package installed kick -i sm -n tanzusm --yes
Note: Once the Reconciliation is resumed the configmap will be overwritten with the original values however it won't affect as the job Completed already.