TKG capi-controller-manager shows "invalid memory address or nil pointer dereference" restoring cluster with "velero restore"

search cancel

TKG capi-controller-manager shows "invalid memory address or nil pointer dereference" restoring cluster with "velero restore"

book

Article ID: 297293

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

When performing a Velero restore, the capi-controller-manager container goes into CrashLoopBackoff:

kubectl get pod -n capi-system | grep capi-controller-manager

Output:

READY   STATUS             RESTARTS   AGE
capi-system                         capi-controller-manager-86f86fb9df-fhrwz                         1/2     CrashLoopBackOff   14         65m

From the manager container logs you see the following error:

kubectl logs -n capi-system capi-controller-manager-86f86fb9df-fhrwz -c manager

runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)

Cause

Velero restores CAPI resources in alphabetical order. This causes the ClusterResourceSetBinding to be created before ClusterResourceSet. However, capi-controller-manager expects the ClusterResourceSet to exist before the ClusterResourceSetBinding.

Environment

Product Version: 1.0

Resolution

Workaround

You can update the resource priority order in Velero.

Make a backup of your Velero deployment:

kubectl get deployment velero -n velero -o yaml > velero_deploy-bak.yaml

You can patch the Velero deployment by adding the `--restore-resource-priorities` field so that the ClusterAPI CRDs are at the end of .spec.containers.

Patch steps:

Create a patch file:

vi velero-patch-file.yaml

Paste the contents into velero-patch-file.yaml exactly as shown below:

spec:
  template:
    spec:
      containers:
      - args:
        - server
        - --features=
        - --restore-resource-priorities=customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods,replicasets.apps,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
        name: velero

Patch the velero deployment:

kubectl patch deployment.apps/velero --patch "$(cat velero-patch-file.yaml)" -n  velero

Note: The section below in bold from the manifest is what is being updated.

.
.
.
spec:
  containers:
  - args:
  - server
  - --features=
  - --restore-resource-priorities=customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods,replicasets.apps,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
  command:
  - /velero

Next:
Use kubectl to delete the ClusterResourceSetBinding object.

Finally:
Rerun your velero restore command.

Feedback

thumb_up Yes

thumb_down No