Error "Skip pod volume <VOLUME_NAME> error: daemonset pod not found in running state in node <NODE_NAME>" during workload cluster backup using Velero
search cancel

Error "Skip pod volume <VOLUME_NAME> error: daemonset pod not found in running state in node <NODE_NAME>" during workload cluster backup using Velero

book

Article ID: 434912

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • Velero backups complete with a PartiallyFailed phase.
  • The backup logs display the following error message for skipped volumes: Skip pod volume <VOLUME_NAME> error: daemonset pod not found in running state in node <NODE_NAME>
  • This failure occurs despite the node-agent pods being confirmed in a Running state on the impacted worker nodes.

Environment

VMware Tanzu Kubernetes Grid Multicloud (TKGm) 3.4+v1.33

Velero 1.16

Cause

  • This is a known issue with Velero v1.16.0, where the controller contains a hardcoded label selector requirement that queries for local node-agent pods using the role=node-agent label.
  • This label is missing from the default DaemonSet pod template specification, causing discovery to fail.

Resolution

  1. Connect to the impacted Kubernetes cluster with administrative privileges.

  2. Verify the node-agent pods are actively running on the worker nodes:

    kubectl get pods -n velero -l name=node-agent -o wide

  3. Patch the node-agent DaemonSet to inject the required role label into the pod template:

    kubectl patch daemonset node-agent -n velero --type=json -p='[{"op": "add", "path": "/spec/template/metadata/labels/role", "value": "node-agent"}]'

  4. Monitor the node-agent DaemonSet to ensure the updated pods successfully roll out and enter a Running state.

  5. Re-run the Velero backup operation to confirm volume data is now successfully uploaded.

Additional Information

Velero Upstream Issue 9361