Tanzu Hub Installation/Upgrade fails at the step:Running errand Installing and configuring packages for Tanzu Hub

search cancel

Tanzu Hub Installation/Upgrade fails at the step:Running errand Installing and configuring packages for Tanzu Hub

book

Article ID: 439816

calendar_today

Updated On:

Products

VMware Tanzu Platform - Hub

Issue/Introduction

Tanzu Hub Installation/Upgrade fails during the step: "Running errand Installing and configuring packages for Tanzu Hub."

When checking the cluster status, you may notice that postgresql-init-job is stuck in the Running state instead of Completed. Similarly, the corresponding pod for this job will be stuck in Running instead of reaching a Completed state.

Checking the pod logs indicates that it is failing to reach the PostgreSQL server and is retrying in a loop, as shown below:

Waiting for PostgreSQL to be ready...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response
PostgreSQL not ready yet, waiting 5 seconds...
postgresql:5432 - no response

Resolution

The init script run by this job is mounted as a ConfigMap, as shown in the snippet below.

  spec:
    containers:
    - command:
      - /bin/bash
      - /scripts/db-init-script.sh
      volumeMounts:
      - mountPath: /scripts
        name: init-script
        readOnly: true
    volumes:
    - configMap:
        defaultMode: 420
        name: postgresql-init-script-ver-1
      name: init-script

Note: The specification has been truncated and updated to include only the relevant lines. Below is the snippet from the initialization script created using this ConfigMap.

    db-init-script.sh: |
      #!/bin/bash
      shopt -s expand_aliases
      set -e

      alias psql="/opt/vmware/postgres/15/bin/psql"
      alias pg_isready="/opt/vmware/postgres/15/bin/pg_isready"

      echo "Waiting for PostgreSQL to be ready..."
      until pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DB; do
        echo "PostgreSQL not ready yet, waiting 5 seconds..."
        sleep 5
      done
      echo "PostgreSQL is ready, running init script..."

The failure is occurring on the following line:

pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DB

By default, the timeout for "pg_isready" to check availability is 3 seconds. If environment-related issues such as network latency or disk I/O cause the connection to the PostgreSQL server to take longer than 3 seconds, this job will continuously fail.

As a workaround, we can increase this timeout to 5 or 10 seconds using the following steps:

1) Pause the postgresql and sm package using the commands below:

kctrl package installed pause -i postgresql -n tanzusm --yes
kctrl package installed pause -i sm -n tanzusm --yes

2) Edit the configmap "postgresql-migration-scripts-ver-1" by increasing the timeout to 5 seconds.

pg_isready -h $POSTGRES_SVC -U $POSTGRES_USER -d $POSTGRES_DB -t 5

3) Delete the pod postgresql-init-job-xxxxxx so the job creates a new one.

Both the Pods and Job should now transition to the Completed status, after which you can resume the reconciliation of both packages as follows:

kctrl package installed kick -i postgresql -n tanzusm --yes
kctrl package installed kick -i sm -n tanzusm --yes

Note: Once the Reconciliation is resumed the configmap will be overwritten with the original values however it won't affect as the job Completed already.

Feedback

thumb_up Yes

thumb_down No