vsphere-syncer in CrashLoopBackOff with "panic: runtime error: invalid memory address or nil pointer dereference"

search cancel

vsphere-syncer in CrashLoopBackOff with "panic: runtime error: invalid memory address or nil pointer dereference"

book

Article ID: 387809

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The vsphere-csi-controller pod will repeatedly enter a CrashLoopBackOff state.

NAMESPACE POD NAME READY STATUS RESTARTS AGE
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 6/7 CrashLoopBackOff 2689 (60s ago) 47d
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 7/7 Running 2743 (6m10s ago) 83d
vmware-system-csi vsphere-csi-controller-xxxxxxx-xxxx 7/7 Running 2731 (14m ago) 84d

Describing the pod returns the following errors:

Warning BackOff 39s kubelet Back-off restarting failed container vsphere-syncer in pod vsphere-csi-controller-xxxxxx-xxxxx

The syncer.log file indicates a segmentation fault due to an invalid memory access or nil pointer dereference:

YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"info","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:910","msg":"CreateVolume: called with args {Name:pvc-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx CapacityRange:required_bytes:5368709120 ..."}
YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"error","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:940","msg":"file volume provisioning is not supported on a stretched supervisor cluster"}

Error messages in the driver logs related to file volume provisioning:

YYYY-MM-DDThh:mm:ss.000Z stderr F {"level":"error","time":"YYYY-MM-DDThh:mm:ss.000Z","caller":"wcp/controller.go:940","msg":"File volume provisioning is not supported on a stretched supervisor cluster",
"TraceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
"stacktrace": "sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp."
}

There also might be PVCs stuck in Pending or Terminating state (which are "ReadWriteMany"):

# k get pvc -A | grep Terminating | wc -l
45
# k get pvc -A | grep Pending | wc -l
19

Validate that customer is using stretched/Three-zone supervisor clusters via vCenter:
(Note: In a vCenter Log Bundle the file commands/wcp-db-dump.py.txt can be reviewed.)

# /usr/lib/vmware-wcp/wcp-db-dump.py | jq ".dump.cluster_db_configs[].desired_config.DeploymentTarget"
{
"FaultDomainZones": [
{
"ID": "vks-c1",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
},
{
"ID": "vks-c2",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
},
{
"ID": "vks-c3",
"ClusterComputeResources": [
{
"type": "ClusterComputeResource",
"value": "domain-c##"
}
]
}
]
}

Environment

VMware vSphere with Tanzu

Cause

File volume provisioning is not supported on a stretched Supervisor Cluster.

See documentation: Supervisor Storage

Resolution

Delete the PV and PVCs

kubectl -n <namespace> delete pv <pv-name>
kubectl -n <namespace> delete pvc <pvc-name>

Scale down and up the CSI controllers

kubectl scale deployment vsphere-csi-controller --replicas=0 -n vmware-system-csi
kubectl scale deployment vsphere-csi-controller --replicas=3 -n vmware-system-csi

Feedback

thumb_up Yes

thumb_down No