Error: "ImagePullBackOff" and "Back-off pulling image "tanzu-sql-postgres.packages.broadcom.com"" causes Tanzu Postgres pods to fail to start after upgrading a Kubernetes cluster
search cancel

Error: "ImagePullBackOff" and "Back-off pulling image "tanzu-sql-postgres.packages.broadcom.com"" causes Tanzu Postgres pods to fail to start after upgrading a Kubernetes cluster

book

Article ID: 424821

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

  • Tanzu Postgres pods have failed to start and have a status of "ImagePullBackOff":

    kubectl get pods -n vcd-ds-workloads

    NAME                        READY  STATUS            RESTARTS  AGE
    example-postgres-0          0/5    ImagePullBackOff  0         #d
    example-postgres-1          0/5    ImagePullBackOff  0         #d

    example-postgres-monitor-0  0/4    ImagePullBackOff  0         #d

  • The Kubernetes cluster hosting a Tanzu Postgres instance was upgraded in Cloud Director Container Service Extension (CSE).
  • Tanzu Postgres was deployed using Cloud Director extension for Data Solutions (DSE).
  • The following error is observed for the Postgres pod:

    kubelet  Back-off pulling image "tanzu-sql-postgres.packages.broadcom.com

  • The Tanzu Postgres Package Repository (pkgr) shows errors when getting and describing it:

    Get the pkgr:

    kubectl get pkgr -n vcd-ds-system tanzu-postgres-operator

    NAME                      AGE    DESCRIPTION
    tanzu-postgres-operator   #d     Reconcile failed: Fetching resources: Error (see .status.usefulErrorMessage for details)


    Describe the pkgr:

    kubectl describe pkgr -n vcd-ds-system tanzu-postgres-operator

    ...
    Error while preparing a transport to talk with the registry:
      Unable to create round tripper:
        GET https://tanzu-sql-postgres.packages.broadcom.com/artifactory/api/docker/tanzu-sql-postgres/v2/token?scope=repository%3Atds-packages%3Apull&service=tanzu-sql-postgres.packages.broadcom.com: :
          Token failed verification: expired
    ...

  • Unable to start the Postgres pod after cluster was upgraded.

Environment

  • VMware Cloud Director 10.6.1.x
  • VMware Cloud Director extension for Data Solutions 1.6.x
  • VMware Cloud Director Container Service Extension 4.2.x
  • VMware Tanzu for Postgres on Kubernetes 3.0

Cause

This issue occurs because the Tanzu Postgres Data Solution API Token expires after 48 hours and after this expiration the images cannot be downloaded.
Upgrading the Kubernetes cluster requires these images to be re-downloaded.

Resolution

To resolve the issue, generate a new API Token for Tanzu Postgres and update DSE with this new token.
Once a valid API Token is present in DSE, the Kubernetes Operator in the cluster will update it in the cluster allowing the images to be downloaded.

  1. To generate a new API Token, click the Token Download icon from the VMware Tanzu for Postgres on Kubernetes download page after logging in as a user with permissions to download this solution.
    NOTE: Token is valid for the next 48.00 hours.

  2. Save the generated "<api_token>" value from the output of the form:

    {"scope":"TNZ-Postgres","access_token":"<api_token>","expires_in":"172800"}

  3. In the Cloud Director Provider portal open Data Solutions > Container Registries.

  4. Select tanzu-sql-postgres.packages.broad.com and click Manage Credential.

  5. Replace the Password with the new API Token saved in step 2 and click Save.
    NOTE: The username should match that which was used to generate the token in step 1.

  6. Wait for the Tanzu Postgres pkgr to go into a healthy state of "Reconcile succeeded".
    This can be checked by running the command:

    kubectl get pkgr -n vcd-ds-system tanzu-postgres-operator

    NAME                      AGE    DESCRIPTION
    tanzu-postgres-operator   #d     Reconcile succeeded


  7. Once the pkgr is in a healthy state the Tanzu Postgres pods should now start successfully once Kubernetes retries again, which is typically after five minutes.
    This can be checked by running the command:

    kubectl get pods -n vcd-ds-workloads

    NAME                        READY  STATUS            RESTARTS  AGE
    example-postgres-0          5/5    Running           0         #d
    example-postgres-1          5/5    Running           0         #d
    example-postgres-monitor-0  4/4    Running           0         #d

Additional Information

For more information on managing the container registries and their credentials, see Managing container registries in VMware Cloud Director extension for Data Solutions.