Symptoms:
- Spark apps, including rawflow, overflow and context spark apps, maybe be stuck in Init state.
root@tb301-runner:~[603]# kubectl get pods -n nsxi-platform | grep -v Running | grep -v Completed
NAME READY STATUS RESTARTS AGE
spark-app-context-driver 0/2 Init:0/3 0 22h
spark-app-overflow-driver 0/2 Init:0/4 0 22h
- Describing of the pods shows fail to mount volume e.g. spark-conf-volume-driver.
root@tb301-runner:~[599]# kubectl describe pods spark-app-context-driver -n nsxi-platform
Name: spark-app-context-driver
Namespace: nsxi-platform
Priority: 0
Node: intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n/30.30.0.55
Start Time: Tue, 18 Jul 2023 00:19:00 -0700
Labels: allow-traffic-to-dns=true
....TRUNCATED
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 14m (x552 over 22h) kubelet, intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-b114368967dd9410-conf-map" not found
Warning FailedMount 4m31s (x685 over 21h) kubelet, intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[spark-conf-volume-driver], unattached volumes=[kube-api-access-8hv5g spark-local-dir-1 driver-coredump processing-tls-cert-volume context-log4j-properties-volume context-well-known-user-sid-volume wait-for-secret-scripts context-override-properties-volume spark-conf-volume-driver]: timed out waiting for the condition
root@tb301-runner:~[607]# kubectl describe pods spark-app-overflow-driver -n nsxi-platform
Name: spark-app-overflow-driver
Namespace: nsxi-platform
Priority: 0
Node: intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n/30.30.0.55
Start Time: Tue, 18 Jul 2023 00:19:10 -0700
Labels: allow-traffic-to-dns=true
allow-traffic-to-kubeapi=true
app.kubernetes.io/instance=intelligence
....TRUNCATED
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 14m (x560 over 22h) kubelet, intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n MountVolume.SetUp failed for volume "spark-conf-volume-driver" : configmap "spark-drv-31e4058967ddae33-conf-map" not found
Warning FailedMount 4m55s (x696 over 22h) kubelet, intelligencecluster-workers-r9dc5-7b87b98dff-f7h5n (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[spark-conf-volume-driver], unattached volumes=[spark-conf-volume-driver driver-coredump overflow-log4j-properties-volume spark-local-dir-1 kube-api-access-4vvpd processing-tls-cert-volume wait-for-secret-scripts scripts overflow-override-properties-volume]: timed out waiting for the condition