1x vsphere-csi-controller pod is constantly in CrashLoopBackOff
root@423c7fe9845f03bca3aa00e328ca200e [ ~ ]# k get pods -n vmware-system-csi -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vsphere-csi-controller-X 7/7 Running 350 (18m ago) 21h X X <none> <none>
vsphere-csi-controller-X 6/7 CrashLoopBackOff 5 (101s ago) 4m45s X X <none> <none>
vsphere-csi-controller-X 7/7 Running 337 (13m ago) 21h X X <none> <none>The vsphere-syncer pod logs show the container crashing due to a nil pointer dereference
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
goroutine 302 [running]:
sigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer.calculateVolumeSnapshotReservedForNamespace({0x0, 0x0}, {0x0, 0x0}, 0x0)
/build/mts/release/bora-X/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/metadatasyncer.go:1322 +0x0
sigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer.syncStorageQuotaReserved({0x0, 0x0}, {0x0, 0x0}, 0x0)
/build/mts/release/bora-X/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/metadatasyncer.go:1137 +0x0
created by sigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer.initStorageQuotaPeriodicSync.func1 in goroutine 301
/build/mts/release/bora-X/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/syncer/metadatasyncer.go:1071 +0x0A StoragePolicyQuota CRD has a negative used value
apiVersion: cns.vmware.com/v1alpha2
kind: StoragePolicyQuota
...
- extensionName: snapshot.cns.vsphere.vmware.com
extensionQuotaUsage:
- scQuotaUsage:
reserved: 819Gi
used: -36Mi <--- NEGATIVE
VCF 9.0
The issue is caused by VolumeSnapshots that have no underlying PVCs as the PVC was deleted.
1. Find out if there is a StoragePolicyQuota CRD with a negative used value with the following command.
kubectl get storagepolicyquotas.cns.vmware.com -A -o json | jq -r '
.items[] |
select(
.status | [.. | .used? | select(. != null) | tostring | startswith("-")] | any
) |
"Namespace: \(.metadata.namespace) | Name: \(.metadata.name)"
'
2. Identify unused VolumeSnapshots and their related PVCs
kubectl get volumesnapshots -n <namespace> -o json | jq -r '
.items[] |
select(.metadata.deletionTimestamp == null) |
select(.status == null or .status.readyToUse == null or .status.readyToUse == false) |
"\(.metadata.name) -> PVC: \(.spec.source.persistentVolumeClaimName)"'
3. For each PVC name returned, verify it no longer exists
kubectl get pvc <pvc-name> -n <namespace>
4. Delete the VolumeSnapshots with no PVCs
kubectl delete volumesnapshot <snapshot-name> -n <namespace>If the deletion hangs, cancel it and patch the VolumeSnapshot to remove the finalizer - kubectl patch volumesnapshot <snapshot-name> -n <namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge