Newly created groups are missing, or flow ingestion paused.
Steps to determine the issue occurred:
1. SSH into the NSX Manager via "root" user.
2. Run the curl command inside Druid broker container to get unhealthy tasks
napp-k get pod | grep druid-broker
napp-k exec -it <druid-broker name> -- curl "https://druid-router:8280/druid/v2/sql" \
--header 'Content-Type: application/json' \
--data '{
"query": "select task_id, datasource, status, runner_status, created_time from sys.tasks where status = '\''RUNNING'\'' and runner_status = '\''NONE'\''",
"context" : {"sqlQueryId" : "request01"},
"header" : true,
"typesHeader" : true,
"sqlTypesHeader" : true
}' -k
3. If there are unhealthy tasks, the curl command should return results like below
[{"task_id":{"type":"STRING","sqlType":"VARCHAR"},"datasource":{"type":"STRING","sqlType":"VARCHAR"},"status":{"type":"STRING","sqlType":"VARCHAR"},"runner_status":{"type":"STRING","sqlType":"VARCHAR"},"created_time":{"type":"STRING","sqlType":"VARCHAR"}},{"task_id":"index_kafka_pace2druid_policy_intent_config_3d3715fdc7d4059_ijlghfkc","datasource":"pace2druid_policy_intent_config","status":"RUNNING","runner_status":"NONE","created_time":"2024-05-29T15:12:16.326Z"}]
NAPP: 4.1.2.1
NSX Intelligence: 4.1.2.1
NSX: 4.1.2
This issue can occur after all druid/zookeeper/kafka/postgres services are down at the same time. For example, after an outage or some errors in the Kubernetes cluster.
Issue will be resolved in a later release. Workaround should be followed at this time.
1. Collect the datasources name from the previous curl results.
2. Run the curl command below to reset the corresponding Druid supervisor
napp-k exec -it <druid-broker name> -- curl --request POST "https://druid-router:8280/druid/indexer/v1/supervisor/<Affected Datasource>/reset" -k
3. Delete nsx-config pod to trigger a full-sync
napp-k get pod | grep nsx-config
napp-k delete pod <nsx-config pod name>
If you face the below issue :
NSX Intelligence flows are underreported or show up as UNCATEGORIZED in the Group View , please refer to the below link :
https://knowledge.broadcom.com/external/article?articleNumber=319066