NSX Intelligence - No traffic is seen in the Plan and Troubleshoot view. Some traffic may be seen in older time selections, but not in the Now view or in more recent time windows.
book
Article ID: 317748
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
Symptoms: Flows are not seen in the Plan and Troubleshoot view in NSX Intelligence 3.2.1 or 4.0.1.
Even though the status of spark-app-rawflow-driver and spark-app-overflow-driver shows as 'Running', the app is not processing flows. This can be determined by checking the following logs. Please log into the NSX Manager as root user and run the following commands:
If you see logs with this exception, then the flow processing app is in an error state and needs to be restarted: ERROR JobScheduler JobScheduler - Error in job generator java.lang.IllegalStateException: JobGenerator has already been stopped accidentally. at org.apache.spark.util.EventLoop.post(EventLoop.scala:107) at org.apache.spark.streaming.scheduler.JobGenerator.$anonfun$timer$1(JobGenerator.scala:63) at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:94) at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106) at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)
Cause
This issue is hit whenever there is a kafka outage for a long period or when the minio cluster is not available for read/write. The minio cluster not being available for write can occur due to a known issue in 4.0.1
Resolution
This issue is resolved in NSX Intelligence 4.1.1.
Workaround: Restart the driver pods that have the above error in the logs. If both spark-app-rawflow-driver and spark-app-overflow-driver pods have the same error, then delete both. They will be automatically restarted by Kubernetes.
Log into the NSX manager as root user and then issue the following commands:
napp-k delete pod spark-app-rawflow-driver napp-k delete pod spark-app-overflow-driver Wait about 5-10 minutes for the pods to start again and ensure that you do not see the same error in the logs. If you do see the same error then check the status of kafka and minio services. If both services are running, then check if minio disk is full using KB 91696. Once minio disk is cleaned up according to the instructions in the linked KB, please restart the driver pods again.
Additional Information
Impact/Risks: Users will be unable to see any flow records received by NSX Intelligence after this error was encountered. No new traffic can be seen in the UI and hence no recommendations can be generated for this new traffic.