TKG capi-controller-manager shows "invalid memory address or nil pointer dereference" restoring cluster with "velero restore"
search cancel

TKG capi-controller-manager shows "invalid memory address or nil pointer dereference" restoring cluster with "velero restore"

book

Article ID: 297293

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

When performing a Velero restore, the capi-controller-manager container goes into CrashLoopBackoff:
kubectl get pod -n capi-system | grep capi-controller-manager 

Output:
READY   STATUS             RESTARTS   AGE
capi-system                         capi-controller-manager-86f86fb9df-fhrwz                         1/2     CrashLoopBackOff   14         65m

From the manager container logs you see the following error:
kubectl logs -n capi-system capi-controller-manager-86f86fb9df-fhrwz -c manager
runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)


Cause

Velero restores CAPI resources in alphabetical order. This causes the ClusterResourceSetBinding to be created before ClusterResourceSet. Howevercapi-controller-manager expects the ClusterResourceSet to exist before the ClusterResourceSetBinding.

Environment

Product Version: 1.0

Resolution

Workaround

You can update the resource priority order in Velero. 
 

Make a backup of your Velero deployment:

kubectl get deployment velero -n velero -o yaml > velero_deploy-bak.yaml
You can patch the Velero deployment by adding the `--restore-resource-priorities` field so that the ClusterAPI CRDs are at the end of .spec.containers.


Patch steps:

Create a patch file:
vi velero-patch-file.yaml

Paste the contents into velero-patch-file.yaml exactly as shown below:
spec:
  template:
    spec:
      containers:
      - args:
        - server
        - --features=
        - --restore-resource-priorities=customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods,replicasets.apps,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
        name: velero

Patch the velero deployment:
kubectl patch deployment.apps/velero --patch "$(cat velero-patch-file.yaml)" -n  velero

Note: The section below in bold from the manifest is what is being updated.
.
.
.
spec:
  containers:
  - args:
  - server
  - --features=
  - --restore-resource-priorities=customresourcedefinitions,namespaces,storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,persistentvolumes,persistentvolumeclaims,secrets,configmaps,serviceaccounts,limitranges,pods,replicasets.apps,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io
  command:
  - /velero


Next:
Use kubectl to delete the ClusterResourceSetBinding object.

Finally:
Rerun your velero restore command.