Harbor Supervisor Service Fails to Reconcile and Harbor Pods in CrashLoopBackOff state
search cancel

Harbor Supervisor Service Fails to Reconcile and Harbor Pods in CrashLoopBackOff state

book

Article ID: 435859

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • When managing Harbor as a Supervisor Service on VMware vSphere with Tanzu, the package status remains in a ReconcileFailed state.
  • The Harbor UI is inaccessible, and the harbor-core and harbor-jobservice pods enter a CrashLoopBackOff status.
  • The describe output of the harbor pkgi shows the following error:

> kubectl describe pkgi <harbor-pkgi-name> -n <harbor-supervisor-service-namespace>

kapp: Error: waiting on reconcile deployment/harbor-jobservice... Finished unsuccessfully (Deployment is not progressing: ProgressDeadlineExceeded (message: ReplicaSet "harbor-jobservice-###" has timed out progressing."))

  • The harbor-core pod logs show the following error indicating a Redis persistence failure:

> kubectl logs harbor-core-#### -n <harbor-namespace>
YYYY-MM-DDThh:mm:ss [ERROR] [/lib/cache/cache.go:###]: failed to ping redis://<redis-hostname>:####/#?idle_timeout_seconds=30, retry after 500ms : MISCONF Redis is configured to save RDB snapshots, but it's currently unable to persist to disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

Environment

  • VMware vSphere with Tanzu 8.x
  • Harbor Supervisor Service

Cause

  • The Redis instance used by Harbor fails to perform a background snapshot (RDB) to disk.
  • With the stop-writes-on-bgsave-error option set to yes, Redis stops accepting write commands upon this failure.
  • Because the harbor-core and harbor-jobservice components require write access to Redis for session management and job coordination, they fail to initialize and enter a crash loop.

Resolution

  1. Scale the Redis StatefulSet down to 0 and then back up to 1 to refresh the service state and clear the write-protection flag.

    kubectl scale sts/harbor-redis --replicas=0 -n <harbor-namespace>
    kubectl scale sts/harbor-redis --replicas=1 -n <harbor-namespace>

  2. Delete the failing pods so they reconnect to the restored Redis instance.

    kubectl delete pod <harbor-core-###> -n <harbor-namespace>
    kubectl delete pod <harbor-jobservice-###> -n <harbor-namespace>

  3. Verify that all pods in the Harbor namespace are successfully running.

    kubectl get pods -n <harbor-namespace>

  4. Confirm the Harbor package status displays as Reconcile succeeded.

    kubectl get pkgi -n <harbor-namespace>