After upgrading both Velero and the Tanzu Kubernetes Release (TKR) to version 1.28 across TKGs (VKS) clusters running Supervisor 7, Velero failed to start. This caused a complete outage of Velero’s backup functionality across all impacted clusters.
The root cause was PodSecurity Admission (PSA) enforcement at the restricted level, which blocked Velero’s ReplicaSet from creating pods due to missing securityContext fields. An additional issue—CPU resource exhaustion—prevented the Velero pod from scheduling, even after the PSA configuration was corrected.
Tanzu Kubernetes Runtime
Two root causes were identified:
Step 1: Lower PSA Enforcement Level
To allow Velero to start, reduce the PSA level from restricted to baseline in the Velero namespace:
kubectl label ns velero pod-security.kubernetes.io/enforce=baseline –overwrite
Then restart the Velero deployment:
kubectl rollout restart deployment -n velero
Step 2: Resolve CPU Resource Constraints (if applicable)
Pods still failed to schedule due to CPU exhaustion. To fix:
You can assess node capacity using the following:
kubectl describe nodes | grep -A5 “Allocatable”
Or use the TMC UI:
Clusters > [cluster name] > Nodes
Depending on findings, consider rebalancing workloads or scaling out the cluster.