Enabling the Velero Operator service provisioning of PVCs in the guest clusters fails
search cancel

Enabling the Velero Operator service provisioning of PVCs in the guest clusters fails

book

Article ID: 323412

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
The following can be observed after the Velero Operator is enabled and a second guest cluster is deployed:

    kubectl get events
    LAST SEEN TYPE REASON OBJECT MESSAGE
    3s Normal ExternalProvisioning persistentvolumeclaim/my-pvc waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator

    kubectl get pvc
    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
    my-pvc Pending tanzu-k8s-custom-policy 41s

    kubectl get sc
    NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
    tanzu-k8s-custom-policy (default) csi.vsphere.vmware.com Delete Immediate true 23h



You notice in the vmware-system-csi the vsphere-csi-controller-* pod is stucked in ContainerCreating. 

    kubectl get all -n vmware-system-csi
    NAME READY STATUS RESTARTS AGE
    pod/vsphere-csi-controller-66b875d646-95bq5 0/6 ContainerCreating 0 23h
    pod/vsphere-csi-node-458ds 3/3 Running 0 23h
    pod/vsphere-csi-node-t82vt 3/3 Running 0 23h

    NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    daemonset.apps/vsphere-csi-node 2 2 2 2 2 none 23h

    NAME READY UP-TO-DATE AVAILABLE AGE
    deployment.apps/vsphere-csi-controller 0/1 1 0 23h

    NAME DESIRED CURRENT READY AGE
    replicaset.apps/vsphere-csi-controller-66b875d646 1 1 0 23h



In the events:

    kubectl get events -n vmware-system-csi
    LAST SEEN TYPE REASON OBJECT MESSAGE
    38s Warning FailedMount pod/vsphere-csi-controller-66b875d646-95bq5 MountVolume.SetUp failed for volume "pvcsi-provider-volume" : secret "pvcsi-provider-creds" not found
    25m Warning FailedMount pod/vsphere-csi-controller-66b875d646-95bq5 Unable to attach or mount volumes: unmounted volumes=[pvcsi-provider-volume], unattached volumes=[socket-dir vsphere
    -csi-controller-token-jqqln pvcsi-provider-volume pvcsi-config-volume]: timed out waiting for the condition
    45m Warning FailedMount pod/vsphere-csi-controller-66b875d646-95bq5 Unable to attach or mount volumes: unmounted volumes=[pvcsi-provider-volume], unattached volumes=[pvcsi-config-volum
    e socket-dir vsphere-csi-controller-token-jqqln pvcsi-provider-volume]: timed out waiting for the condition
    29m Warning FailedMount pod/vsphere-csi-controller-66b875d646-95bq5 Unable to attach or mount volumes: unmounted volumes=[pvcsi-provider-volume], unattached volumes=[vsphere-csi-contro
    ller-token-jqqln pvcsi-provider-volume pvcsi-config-volume socket-dir]: timed out waiting for the condition
    4m58s Warning FailedMount pod/vsphere-csi-controller-66b875d646-95bq5 Unable to attach or mount volumes: unmounted volumes=[pvcsi-provider-volume], unattached volumes=[pvcsi-provider-vol
    ume pvcsi-config-volume socket-dir vsphere-csi-controller-token-jqqln]: timed out waiting for the condition



Note:The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware vSphere 7.0.x

Cause

When checking /var/log/pods/vmware-system-tkg_vmware-system-tkg-controller-manager-xxxx/manager/x.log:

2021-04-20T01:47:20.021506556Z stderr F E0420 01:47:20.021328       1 serviceaccount_controller.go:147] vmware-system-tkg-controller-manager/provider-serviceaccount-controller/development/test-velero "msg"="Error ensuring provider serviceaccounts" "error"="unable to sync secret for provider serviceaccount test-velero-pvbackupdriver: namespaces \"velero-vsphere-plugin-backupdriver\" not found"
2021-04-20T01:47:20.022474396Z stderr F E0420 01:47:20.022104       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="unable to sync secret for provider serviceaccount test-velero-pvbackupdriver: namespaces \"velero-vsphere-plugin-backupdriver\" not found" "controller"="provider-serviceaccount-controller" "name"="test-velero" "namespace"="development"

Resolution

We are currently working on a fix to be implemented in a future release. (7.0U3 p03)

Workaround:
The workaround is to either install the Velero plugin in each guest cluster, or create `velero-vsphere-plugin-backupdriver` namespace in each guest cluster. After that, the controller will eventually create all the secrets.

Additional Information

- https://github.com/vmware-tanzu/velero-plugin-for-vsphere
- https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md

Impact/Risks:
After enabling the Velero Operator you cannot deploy any PVCs.
Every new Deployment of a new TKG Cluster hits the same issue from then on.