message: 'error running restore err=chdir /host_pods/4c3b4055-4815-4e9b-b131-75a94c762748/volumes/kubernetes.io~csi/pvc-xxxx-xxxx-xxxx-xxxx-xxxx/mount:
no such file or directory: chdir /host_pods/4c3b4055-4815-4e9b-b131-75a94c762748/volumes/kubernetes.io~csi/pvc-xxxcccc-xxxx-xxxx-xxxx-xxxxxxxxxx/mount:
no such file or directory'
This can be observed in Restic pod node-agent-xxxThis issue doesn't seem to be with velero or how k8s handles statefulsets but with the minio operator.
This is the process observed during restore
The pods were being terminated because somebody had updated the statefulset (generation had increased since restore). In general an "operator" or "controller" is responsible for watching and updating the resources that it owns, in this case the statefulset.
Disable the reconciliation process by reducing number of pod of the operator to 0 or pause active reconciliation
Validated workaround is by temporarily disabling the minio operator before starting the restore and then reenabling it once the restore completes. You can do this by editing the minio-operator deployment and setting replicas to 0 (you may need to change replicas via helm if the deployment is actively managed and gets auto reset).
Same would apply other subsystems that actively monitor a deployment or statefulset
$ kubectl -n minio-operator edit deploy minio-operator
// verify no minio-operator pods
$ kubectl -n minio-operator get pod
CSI snapshots is alternative that would work since unlike FSB/restic, it does not depend on the pods for the volume restores. But CSI snapshot support is not yet ready on TKG(S) and therefore not available for such clusters through the TMC today.
In case the restore gets hung and you don't want to wait for it to fail. Basically just restart the velero pod.
kubectl -n velero rollout restart deploy velero
The area of the code in minio operator responsible for updating the statefulset: