NSX Intelligence flows are underreported or show up as UNCATEGORIZED in the Group View
book
Article ID: 319066
calendar_today
Updated On:
Products
VMware vDefend FirewallVMware vDefend Firewall with Advanced Threat Prevention
Issue/Introduction
Symptoms: Flows in the Group View show up as UNCATEGORIZED or Flow ingestion has paused. In the latter case, reduction may be seen in compute view as well.
There are a few ways to determine this issue occurred: 1. Open the druid console and check if there are “RUNNING” ingestion task for each supervisor
SSH into the NSX Manager via "root" user. Access the druid-overlord pod from NSX Manager: napp-k exec -it svc/druid-overlord -- bash Inside the overlord, call the following API to check how many tasks are running. Search the keyword datasource. At least one task should be running for each of the following datasources: correlated_flow, correlated_flow_viz, correlated_flow_rec, pace2druid_manager_realization_config, pace2druid_policy_intent_config. curl -X GET 'https://localhost:8290/druid/indexer/v1/runningTasks' -k
2. Search the issue in the logs
To get the logs, SSH into the NSX Manager via "root" user. On NSX manager, run the command:
napp-k get pods --selector='app.kubernetes.io/component=druid.overlord' You should find a pod with prefix "druid-overlord-" as in this example: napp-k get pods --selector='app.kubernetes.io/component=druid.overlord' NAME READY STATUS RESTARTS AGE druid-overlord-7b6849f98b-n97xm 1/1 Running 1 11h Then run the following command to check the logs:
napp-k logs <name of the druid overlord pod> Example: napp-k logs druid-overlord-7b6849f98b-n97xm 2022-08-25T20:31:17,944 INFO [KafkaSupervisor-correlated_flow-Worker-0] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Setting taskGroup sequences to [{0={6=37104640}}] for group [6] 2022-08-25T20:31:18,053 INFO [KafkaSupervisor-correlated_flow] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [correlated_flow] supervisor is running.
In the logs, search the following string: "durationSeconds=600, active=[{id='index_kafka_", you should be able to see logs about the status of supervisors. (1) If you are seeing UNCATEGORIZED flows, search: "durationSeconds=600, active=[{id='index_kafka_pace2druid_manager_realization_config" (2) If you don't see any flows, search: "rationSeconds=600, active=[{id='index_kafka_correlated_flow_viz" There will be a task inside the active field. Search the id of this task in the logs. This task should not stay more than 10 minutes in the active list. If it persists more than 20 minutes, then the issue may be due to Druid overlord. Example below: The same task "index_kafka_pace2druid_manager_realization_config_fb79e2e6d49f685_fpfhoaap" is still active after 2.5 hours. 2022-08-01T20:07:16.741076711Z stdout F 2022-08-01T20:07:16,740 INFO [KafkaSupervisor-pace2druid_manager_realization_config] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='pace2druid_manager_realization_config', generationTime=2022-08-01T20:07:16.740Z, payload=KafkaSupervisorReportPayload{dataSource='pace2druid_manager_realization_config', topic='pace2druid_manager_realization_config', partitions=1, replicas=1, durationSeconds=600, active=[{id='index_kafka_pace2druid_manager_realization_config_fb79e2e6d49f685_fpfhoaap', startTime=null, remainingSeconds=null}], publishing=[], suspended=false, healthy=false, state=UNHEALTHY_SUPERVISOR, detailedState=UNABLE_TO_CONNECT_TO_STREAM, recentErrors=[ExceptionEvent{timestamp=2022-08-01T19:58:16.726Z, exceptionClass='org.apache.kafka.common.errors.TimeoutException', message='org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata'}, ExceptionEvent{timestamp=2022-08-01T19:59:16.727Z, exceptionClass='org.apache.kafka.common.errors.TimeoutException', message='org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata'}, ExceptionEvent{timestamp=2022-08-01T20:00:03.097Z, exceptionClass='org.apache.druid.java.util.common.ISE', message='org.apache.druid.java.util.common.ISE: No partitions found for stream [pace2druid_manager_realization_config]'}]}} 2022-08-01T22:47:16.743645745Z stdout F 2022-08-01T22:47:16,742 INFO [KafkaSupervisor-pace2druid_manager_realization_config] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='pace2druid_manager_realization_config', generationTime=2022-08-01T22:47:16.742Z, payload=KafkaSupervisorReportPayload{dataSource='pace2druid_manager_realization_config', topic='pace2druid_manager_realization_config', partitions=1, replicas=1, durationSeconds=600, active=[{id='index_kafka_pace2druid_manager_realization_config_fb79e2e6d49f685_fpfhoaap', startTime=null, remainingSeconds=null}], publishing=[], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[ExceptionEvent{timestamp=2022-08-01T19:58:16.726Z, exceptionClass='org.apache.kafka.common.errors.TimeoutException', message='org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata'}, ExceptionEvent{timestamp=2022-08-01T19:59:16.727Z, exceptionClass='org.apache.kafka.common.errors.TimeoutException', message='org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata'}, ExceptionEvent{timestamp=2022-08-01T20:00:03.097Z, exceptionClass='org.apache.druid.java.util.common.ISE', message='org.apache.druid.java.util.common.ISE: No partitions found for stream [pace2druid_manager_realization_config]'}]}}
This is a known issue in NSX Intelligence 4.0.1
Cause
This issue can occur after all druid/zookeeper/kafka/postgres services are down at the same time. For example, after an outage or some errors in the Kubernetes cluster.
Resolution
Issue will be resolved in a later release. Workaround should be followed at this time.
Workaround: Restart the druid-overlord pod in the nsxi-platform namespace with the following command:
napp-k delete pod <name of the druid overlord pod>
Example: napp-k delete pod druid-overlord-7b6849f98b-n97xm