Rawflowcorrelator pod stuck in Pending state
search cancel

Rawflowcorrelator pod stuck in Pending state

book

Article ID: 388841

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

After enabling Intelligence users do not see some flows on the Plan and Troubleshoot page. Further inspection shows a few spark application executors in Pending state and the error shows "persistentvolumeclaim "rawflowcorrelator-xxxxxx-exec-x-pvc-0" not found"


SSH to the SSPI instance and list all spark application pods and observe the status :


k -n nsxi platform  get pods | grep rawflow

nsxi-platform       rawflowcorrelator-007f6a950395e8fd-exec-1                         1/1     Running     0                37m     172.21.7.85       ssp-vsx-md-0-5dpqh-4qgx5   <none>           <none>
nsxi-platform       rawflowcorrelator-007f6a950395e8fd-exec-2                         1/1     Running     0                37m     172.21.7.83       ssp-vsx-md-0-5dpqh-4qgx5   <none>           <none>
nsxi-platform       rawflowcorrelator-007f6a950395e8fd-exec-3                         0/1     Pending     0                37m     <none>            <none>                     <none>           <none>

or

k -n nsxi platform  get pods | grep overflow

nsxi-platform       overflowcorrelator-bb15a0950395cdcd-exec-1                        1/1     Running     0                37m     172.21.7.82       ssp-vsx-md-0-5dpqh-4qgx5   <none>           <none>
nsxi-platform       overflowcorrelator-bb15a0950395cdcd-exec-2                        1/1     Running     0                37m     172.21.7.86       ssp-vsx-md-0-5dpqh-4qgx5   <none>           <none>
nsxi-platform       overflowcorrelator-bb15a0950395cdcd-exec-3                        1/1     Running     0                37m     172.21.7.84       ssp-vsx-md-0-5dpqh-4qgx5   <none>           <none>

If any pods are stuck in Pending state for more than 10 minutes, describe the pod to get more details

k describe po rawflowcorrelator-007f6a950395e8fd-exec-3

If the events section shows anything like the below message, then check the PVC output to see if the PVC is listed  Warning FailedScheduling 5m55s (x553 over 8h) default-scheduler 0/8 nodes are available: persistentvolumeclaim "rawflowcorrelator-007f6a950395e8fd-exec-3-pvc-0" not found. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.


k get pvc | grep rawflow

NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
overflowcorrelator-10522194d39223db-exec-1-pvc-0   Bound    pvc-48301211-99b0-4a1c-b5d2-edb21628d526   4Gi        RWO            ssp-china-sc   <unset>                 13d
rawflowcorrelator-007f6a950395e8fd-exec-1-pvc-0    Bound    pvc-bfa66a98-a31a-439b-903c-386dca44d9fb   4Gi        RWO            ssp-china-sc   <unset>                 11d
rawflowcorrelator-007f6a950395e8fd-exec-2-pvc-0    Bound    pvc-f1358f89-0db4-447c-94d0-5fcf7149860f   4Gi        RWO            ssp-china-sc   <unset>                 11d

You should see as many PVCs here as the number of executor. Each executor should have a matching PVC. If PVC for the Pending executor is missing, then proceed to the resolution steps. In the above example rawflowcorrelator-007f6a950395e8fd-exec-3-pvc-0 is indeed missing.

Environment

SSP 5.0.0

Cause

PVC creation attempts from the spark-app-rawflow-driver or spark-app-overflow-driver failed during startup due to slow network connection.

Resolution

The application experiencing Pending' executors needs to be restarted.
If rawflow executors are stuck in Pending state, then kill the spark-app-rawflow-driver pod.


k -n nsxi platform delete po spark-app-rawflow-driver


If overflow executors are stuck in Pending state, then kill the spark-app-overflow-driver pod. 


k -n nsxi platform delete po spark-app-overflow-driver

Deleting the driver pod restarts the flow processing applications. This causes a momentary disruption to processing flows but the flows will be queued up and processed after the app restarts.