Harbor UI inaccessible after Velero restore due to namespace mismatch and resource conflict
search cancel

Harbor UI inaccessible after Velero restore due to namespace mismatch and resource conflict

book

Article ID: 434046

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

After performing a Velero restore of a Harbor CNF, the following symptoms are observed:

  • Harbor UI is inaccessible (e.g., 403 Forbidden, 502 Bad Gateway, or Connection Refused).
  • Ingress NGINX Controller and MetalLB Controller pods are stuck in CrashLoopBackOff or Ready: 0/1.
  • MetalLB Speaker pods fail to advertise routes or assign External IPs to LoadBalancer services.

Log Evidence

  • Ingress-Nginx-Controller Logs:

W0304 #####  reflector.go:569] failed to list *v1.ConfigMap: configmaps is forbidden: User "#####" cannot list resource "configmaps" in API group "" at the cluster scope
E0304 #####  reflector.go:166] "Unhandled Error" err="Failed to watch *v1.Secret: secrets is forbidden: User \"#####\" cannot list resource \"secrets\" in API group \"\""

  • MetalLB-Controller Logs:

W0304  #####   reflector.go:561] failed to list apiextensions.k8s.io/v1, Kind=CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "#####" cannot list resource "customresourcedefinitions" at the cluster scope

Environment

TCA 3.x

TCP 5.0

Cause

The issue is primarily caused by a Namespace Mismatch in the subjects section of the ClusterRoleBinding or RoleBinding objects following a Velero restore.

  1. Namespace Redirection
    During the restore process, the RBAC resources were incorrectly mapped with the namespace parameter set to an alternate or temporary restore namespace (e.g., harbor-restored) instead of the original namespace (harbor). Consequently, the API Server denies the pods' actual ServiceAccounts the permissions required to watch and list cluster resources.
  2. Unsupported Deployment Model (Resource Conflict)
    Restoring Harbor into the same cluster where an existing Harbor instance is already running is not a supported deployment model. Harbor generates several cluster-level resources (such as ClusterRoleBindings, IngressClasses, and ValidatingWebhookConfigurations). When multiple instances exist:
    • Global resources from the restored instance may overwrite or conflict with the active instance.
    • The API server may become "confused" by multiple controllers attempting to manage the same global classes, leading to the functional deadlocks observed in the logs.
  •  

Resolution

To resolve the issue, the RBAC bindings must be corrected to point to the original namespace where the pods are intended to run.

  1. Step 1: Identify the Affected Bindings
    Identify the bindings that contain the incorrect namespace (harbor-restored):
    kubectl get clusterrolebinding -o json | jq -r '.items[] | select(.subjects[].namespace=="harbor-restored") | .metadata.name'
  2. Step 2: Patch the Bindings
    Apply a patch to update the namespace field within the subjects array back to the original harbor namespace.
    Patch for Ingress Controller:
    kubectl patch clusterrolebinding ingress-ngi-95992-lurjn-ingress-nginx --type='json' -p='[{"op": "replace", "path": "/subjects/0/namespace", "value": "harbor"}]'
    Patch for MetalLB Controller:
    kubectl patch clusterrolebinding metallb-014-95992-bxazy-controller --type='json' -p='[{"op": "replace", "path": "/subjects/0/namespace", "value": "harbor"}]'
    Patch for MetalLB Speaker:
    kubectl patch clusterrolebinding metallb-014-95992-bxazy-speaker --type='json' -p='[{"op": "replace", "path": "/subjects/0/namespace", "value": "harbor"}]'
  3. Step 3: Restart Impacted Pods
    Restart the pods to clear the CrashLoopBackOff and force them to re-authenticate with the corrected RBAC:
    kubectl delete pod -n harbor -l app.kubernetes.io/name=ingress-nginx
    kubectl delete pod -n harbor -l app.kubernetes.io/name=metallb

Additional Information

Post-Recovery Verification & Best Practices

  1. Verify Status: Ensure all pods are in Running status and the Ingress service has an External IP assigned.
  2. Deployment Policy: Avoid running concurrent Harbor instances in the same cluster. If a restore is necessary for testing, it should be performed in a clean, isolated cluster to prevent cluster-level resource contention.