Symptoms:
1) After enabling NSX Intelligence/Security Intelligence, it shows as down. And on further inspection shows a few spark application executors in Pending state and the error shows "persistentvolumeclaim "rawflowcorrelator-xxxxxx-exec-x-pvc-0" not found"
SSH into NSX Manager CLI using root credentials and run the below to see if the rawflowcorrelator pod status.
# napp-k get pods -n nsxi-platform | grep rawflow
nsxi-platform rawflowcorrelator-0x0x0x0x0x0x-exec-3 0/1 Pending 0 37m <none> <none> <none> <none>
2) Describe the the pod for more details on the error and observe if you are seeing the below under the "Events"
# napp-k describe pod rawflowcorrelator-0x0x0x0x0x0x-exec-3
"Warning FailedScheduling 5m55s (x553 over 8h) default-scheduler 0/8 nodes are available: persistentvolumeclaim "rawflowcorrelator-0x0x0x0x0x0x-exec-3-pvc-0" not found. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling."
3) If above error event is observed under describe, check if the PVC for the Pending executor is present or missing and if missing then proceed to the resolution steps - To check whether or not the PVC for the executor pod is missing, please run the below and observe nothing pops up in the output.
# napp-k get pvc | grep rawflow
If any one of the above symptoms do not match, this KB is not a relevant match for your problem statement. Try checking other KBs for relevant matches.
NAPP 4.1.2
PVC creation attempts from the spark-app-rawflow-driver or spark-app-overflow-driver pods failed during startup due to slow network connection.
The application experiencing "Pending" executors needs to be restarted.
If rawflow executors are stuck in Pending state, then you need to restart the spark-app-rawflow-driver pod.
# napp-k delete pod spark-app-rawflow-driver -n nsxi platform
If overflow executors are stuck, restart the spark-app-overflow-driver pod.
# napp-k delete pod spark-app-overflow-driver -n nsxi platform
Deleting the driver pod restarts the flow processing applications.
This causes a momentary disruption to processing flows but the flows will be queued up and processed after the pod restarts.