The vsphere-csi-controller
pod will repeatedly enter a CrashLoopBackOff
state.
NAMESPACE POD NAME READY STATUS RESTARTS AGE
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 6/7 CrashLoopBackOff 2689 (60s ago) 47d
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 7/7 Running 2743 (6m10s ago) 83d
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 7/7 Running 2731 (14m ago) 84d
Describing the pod returns the following errors:
Warning BackOff 39s kubelet Back-off restarting failed container vsphere-syncer in pod vsphere-csi-controller-xxxxxx-xxxxx
The syncer.log file indicates a segmentation fault due to an invalid memory access or nil pointer dereference:
YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"info","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:910","msg":"CreateVolume: called with args {Name:pvc-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx CapacityRange:required_bytes:5368709120 ..."}
YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"error","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:940","msg":"file volume provisioning is not supported on a stretched supervisor cluster"}
Error messages in the driver logs related to file volume provisioning:
YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"error","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:940","msg":"File volume provisioning is not supported on a stretched supervisor cluster",
"TraceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
"stacktrace": "sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp."
}
There also might be PVCs stuck in Pending or Terminating state (which are "ReadWriteMany"):
# k get pvc -A | grep Terminating | wc -l
45
# k get pvc -A | grep Pending | wc -l
19
Validate that customer is using stretched/Three-zone supervisor clusters via vCenter:
(Note: In a vCenter Log Bundle the file commands/wcp-db-dump.py.txt can be reviewed.)
# /usr/lib/vmware-wcp/wcp-db-dump.py | jq ".dump.cluster_db_configs[].desired_config.DeploymentTarget"
{
"FaultDomainZones": [
{
"ID": "vks-c1",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
},
{
"ID": "vks-c2",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
},
{
"ID": "vks-c3",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
}
]
}
VMware vSphere with Tanzu
File volume provisioning is not supported on a stretched Supervisor Cluster.
See documentation: Supervisor Storage
Delete the PV and PVCs
kubectl -n <namespace> delete pv <pv-name>
kubectl -n <namespace> delete pvc <pvc-name>
Scale down and up the CSI controllers
kubectl scale deployment vsphere-csi-controller --replicas=0 -n vmware-system-csi
kubectl scale deployment vsphere-csi-controller --replicas=3 -n vmware-system-csi