NSX Intelligence - No traffic is seen in the Plan and Troubleshoot view. Some traffic may be seen in older time selections, but not in the Now view or in more recent time windows.

search cancel

NSX Intelligence - No traffic is seen in the Plan and Troubleshoot view. Some traffic may be seen in older time selections, but not in the Now view or in more recent time windows.

book

Article ID: 317748

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
Flows are not seen in the Plan and Troubleshoot view in NSX Intelligence 3.2.1 or 4.0.1.

Even though the status of spark-app-rawflow-driver and spark-app-overflow-driver shows as 'Running', the app is not processing flows.
This can be determined by checking the following logs. Please log into the NSX Manager as root user and run the following commands:

napp-k logs spark-app-rawflow-driver -c spark-kubernetes-driver

napp-k logs spark-app-overflow-driver -c spark-kubernetes-driver

If you see logs with this exception, then the flow processing app is in an error state and needs to be restarted:

ERROR JobScheduler JobScheduler - Error in job generator
java.lang.IllegalStateException: JobGenerator has already been stopped accidentally.
        at org.apache.spark.util.EventLoop.post(EventLoop.scala:107)
        at org.apache.spark.streaming.scheduler.JobGenerator.$anonfun$timer$1(JobGenerator.scala:63)
        at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:94)
        at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
        at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)

Cause

This issue is hit whenever there is a kafka outage for a long period or when the minio cluster is not available for read/write. The minio cluster not being available for write can occur due to a known issue in 4.0.1

Resolution

This issue is resolved in NSX Intelligence 4.1.1.

Workaround:
Restart the driver pods that have the above error in the logs. If both spark-app-rawflow-driver and spark-app-overflow-driver pods have the same error, then delete both. They will be automatically restarted by Kubernetes.

Log into the NSX manager as root user and then issue the following commands:

napp-k delete pod spark-app-rawflow-driver
napp-k delete pod spark-app-overflow-driver

Wait about 5-10 minutes for the pods to start again and ensure that you do not see the same error in the logs.
If you do see the same error then check the status of kafka and minio services. If both services are running, then check if minio disk is full using KB 91696 . Once minio disk is cleaned up according to the instructions in the linked KB, please restart the driver pods again.

Additional Information

Impact/Risks:
Users will be unable to see any flow records received by NSX Intelligence after this error was encountered. No new traffic can be seen in the UI and hence no recommendations can be generated for this new traffic.

Feedback

thumb_up Yes

thumb_down No