After enabling Intelligence users do not see some flows on the Plan and Troubleshoot page. Further inspection shows a few spark application executors in Pending state and the error shows "persistentvolumeclaim "rawflowcorrelator-xxxxxx-exec-x-pvc-0" not found"
SSH to the SSPI instance and list all spark application pods and observe the status :
k -n nsxi platform get pods | grep rawflow
nsxi-platform rawflowcorrelator-007f6a950395e8fd-exec-1 1/1 Running 0 37m 172.21.7.85 ssp-vsx-md-0-5dpqh-4qgx5 <none> <none>
nsxi-platform rawflowcorrelator-007f6a950395e8fd-exec-2 1/1 Running 0 37m 172.21.7.83 ssp-vsx-md-0-5dpqh-4qgx5 <none> <none>
nsxi-platform rawflowcorrelator-007f6a950395e8fd-exec-3 0/1 Pending 0 37m <none> <none> <none> <none>
or
k -n nsxi platform get pods | grep overflow
nsxi-platform overflowcorrelator-bb15a0950395cdcd-exec-1 1/1 Running 0 37m 172.21.7.82 ssp-vsx-md-0-5dpqh-4qgx5 <none> <none>
nsxi-platform overflowcorrelator-bb15a0950395cdcd-exec-2 1/1 Running 0 37m 172.21.7.86 ssp-vsx-md-0-5dpqh-4qgx5 <none> <none>
nsxi-platform overflowcorrelator-bb15a0950395cdcd-exec-3 1/1 Running 0 37m 172.21.7.84 ssp-vsx-md-0-5dpqh-4qgx5 <none> <none>
If any pods are stuck in Pending state for more than 10 minutes, describe the pod to get more details
k describe po rawflowcorrelator-007f6a950395e8fd-exec-3
If the events section shows anything like the below message, then check the PVC output to see if the PVC is listed Warning FailedScheduling 5m55s (x553 over 8h) default-scheduler 0/8 nodes are available: persistentvolumeclaim "rawflowcorrelator-007f6a950395e8fd-exec-3-pvc-0" not found. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
k get pvc | grep rawflow
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
overflowcorrelator-10522194d39223db-exec-1-pvc-0 Bound pvc-48301211-99b0-4a1c-b5d2-edb21628d526 4Gi RWO ssp-china-sc <unset> 13d
rawflowcorrelator-007f6a950395e8fd-exec-1-pvc-0 Bound pvc-bfa66a98-a31a-439b-903c-386dca44d9fb 4Gi RWO ssp-china-sc <unset> 11d
rawflowcorrelator-007f6a950395e8fd-exec-2-pvc-0 Bound pvc-f1358f89-0db4-447c-94d0-5fcf7149860f 4Gi RWO ssp-china-sc <unset> 11d
You should see as many PVCs here as the number of executor. Each executor should have a matching PVC. If PVC for the Pending executor is missing, then proceed to the resolution steps. In the above example rawflowcorrelator-007f6a950395e8fd-exec-3-pvc-0 is indeed missing.
SSP 5.0.0
PVC creation attempts from the spark-app-rawflow-driver or spark-app-overflow-driver failed during startup due to slow network connection.
The application experiencing Pending' executors needs to be restarted.
If rawflow executors are stuck in Pending state, then kill the spark-app-rawflow-driver pod.
k -n nsxi platform delete po spark-app-rawflow-driver
If overflow executors are stuck in Pending state, then kill the spark-app-overflow-driver pod.
k -n nsxi platform delete po spark-app-overflow-driver
Deleting the driver pod restarts the flow processing applications. This causes a momentary disruption to processing flows but the flows will be queued up and processed after the app restarts.