Receive error "daemonset pod not found in running state in node ####" when creating Velero backup
search cancel

Receive error "daemonset pod not found in running state in node ####" when creating Velero backup

book

Article ID: 388328

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

When creating backup with Velero, it fails with error message "daemonset pod not found in running state in node ####" as below. 

$ velero backup get backup-202502011215
NAME                  STATUS     ERRORS   WARNINGS   CREATED        EXPIRES   STORAGE LOCATION   SELECTOR
backup-202502011215   PartiallyFailed   1       0    2025-02-01 00:12:15 +0200    21d     minio    <none>

Phase:  PartiallyFailed (run `velero backup logsbackup-202502011215` for more information)

Errors:
  Velero:    name: /<POD_NAME> message: /Error backing up item error: /daemonset pod not found in running state in node <A_CONTROL_PLANE_NODE>
  Cluster:    <none>
  Namespaces: <none>

 

Environment

  • Tanzu Kubernetes Grid
  • Tanzu Platform for Kubernetes
  • Velero

Cause

By default, during Velero installation, node-agent daemonset is only installed on Kubernetes worker nodes. The node-agent is required during backup. However if a pod is running on control-plane node, because of lacking of the node-agent, the backup would not be executed and Velero will return "daemonset pod not found in running state in node ####" error message. 

Resolution

Please apply the resolutions as below based on your environment: 

  • Only Kubernetes system components should be deployed on control-plane nodes. If it is not necessary to back up system pods, please append `--exclude-####` with `velero backup create` to exclude those namespaces or objects.
  • If the pods are custom workload but deployed on control-plane nodes, this is not recommended configuration, custom workload should be deployed on worker nodes. Please configure custom workload tolerance properly to avoid them being deployed on control-plane.
  • It is possible to deploy Velero node-agent on control-plane node by adding tolerance to node-agent daemon set. This is not recommended because above reasons.