Failed to start recommendation or Druid query failure because Druid router can't connect to Druid broker
search cancel

Failed to start recommendation or Druid query failure because Druid router can't connect to Druid broker

book

Article ID: 379355

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Failed to start recommendation and the error message indicates it's a Druid exception. The visualization page may also have the same error message. 



Symptom:
1. The user can't start a recommendation with a Druid exception error message.
2. Druid pods or zookeeper pods may restarted.
3. In the Druid router pod log, it says it failed to find the Druid broker.
4. Get the Druid router pod name with the command 'napp-k get pod | grep druid-router'. Then run the command 'napp-k exec -it <druid-router name> bash -- curl https://druid-router:8280/druid/router/v1/brokers -k'. The command result should return an empty list like '{"druid/broker":[]}'



Log:
Run the following command in the NSX manager as root:

(1) Get the Druid router log 'napp-k logs <druid-router name>'


You should see the similar error log


2024-10-08T12:01:00,837 ERROR [qtp355366659-149] org.apache.druid.server.router.QueryHostFinder - No server found for serviceName[druid/broker]. Using backup
2024-10-08T12:01:00,837 ERROR [qtp355366659-149] org.apache.druid.server.router.QueryHostFinder - No backup found for serviceName[druid/broker]. Using default[druid/broker]
2024-10-08T12:01:00,837 ERROR [qtp355366659-149] org.apache.druid.server.router.QueryHostFinder - Catastrophic failure! No brokers found at all! Failing request!: {class=org.apache.druid.server.router.QueryHostFinder}
2024-10-08T12:01:00,837 WARN [qtp355366659-149] org.apache.druid.server.AsyncQueryForwardingServlet - Unexpected exception occurs
org.apache.druid.query.QueryInterruptedException: There are no available brokers for query[GroupByQuery{dataSource='pace2druid_manager_realization_config', querySegmentSpec=LegacySegmentSpec{intervals=[2024-10-01T00:01:00.000Z/2024-10-08T12:00:38.000Z]}, virtualColumns=[ExpressionVirtualColumn{name='VC_CONCATsource_groups', expression='array_to_string(source_groups,'@@')', outputType=STRING}, ExpressionVirtualColumn{name='VC_CONCATdestination_groups', expression='array_to_string(destination_groups,'@@')', outputType=STRING}, ExpressionVirtualColumn{name='VC_CONCATservices_array', expression='array_to_string(services_array,'@@')', outputType=STRING}], limitSpec=NoopLimitSpec, dimFilter=(rule_id IN (2, 4) && site_id = ecdd91ff-c84c-4dce-9779-1468bde44730 && config_type = MANAGER_DFW_RULE), granularity=AllGranularity, dimensions=[DefaultDimensionSpec{dimension='rule_id', outputName='rule_id', outputType='STRING'}], aggregatorSpecs=[StringLastAggregatorFactory{fieldName='VC_CONCATsource_groups', name='VC_CONCATsource_groups', maxStringBytes=1024, timeColumn=__time}, StringLastAggregatorFactory{fieldName='VC_CONCATdestination_groups', name='VC_CONCATdestination_groups', maxStringBytes=1024, timeColumn=__time}, StringLastAggregatorFactory{fieldName='VC_CONCATservices_array', name='VC_CONCATservices_array', maxStringBytes=1024, timeColumn=__time}, LongLastAggregatorFactory{name='lastUpdateTime', fieldName='__time', timeColumn='__time'}, LongLastAggregatorFactory{name='latest_last_modified_time', fieldName='latest_last_modified_time', timeColumn='__time'}, LongLastAggregatorFactory{name='deleted', fieldName='deleted', timeColumn='__time'}, LongLastAggregatorFactory{name='latest_revision', fieldName='latest_revision', timeColumn='__time'}], postAggregatorSpecs=[ExpressionPostAggregator{name='source_groups', expression='string_to_array(VC_CONCATsource_groups,'@@')', ordering=null, outputType=null}, ExpressionPostAggregator{name='destination_groups', expression='string_to_array(VC_CONCATdestination_groups,'@@')', ordering=null, outputType=null}, ExpressionPostAggregator{name='services_array', expression='string_to_array(VC_CONCATservices_array,'@@')', ordering=null, outputType=null}], havingSpec=null, context={queryId=PROCESSING-RAWFLOW-3-9d5f2cf2-7917-4a6f-aa63-27cc0b633564}}].Please check that your brokers are running and healthy.
 at org.apache.druid.query.QueryInterruptedException.wrapIfNeeded(QueryInterruptedException.java:113) ~[druid-processing-29.0.1.jar:29.0.1]
 at org.apache.druid.server.AsyncQueryForwardingServlet.handleException(AsyncQueryForwardingServlet.java:117) ~[druid-services-29.0.1.jar:29.0.1]
 at org.apache.druid.server.AsyncQueryForwardingServlet.service(AsyncQueryForwardingServlet.java:271) ~[druid-services-29.0.1.jar:29.0.1]

Environment

NAPP 4.2.0
NSX 4.1.2
Kubernetes tool version-v.1.23.8+vmware.3

Cause

Druid router can't find Druid broker by zookeeper.

Resolution

There's no fix at the moment.

Workaround:


1. Get the Druid pod names with 'napp-k get pod | grep druid' and delete all the pods with name prefixes (druid-router, druid-broker, druid-coordinator, druid-historical and druid-middle-manager) using the command 'napp-k delete pod <pod name>'

2. Get the Druid router pod name with the command 'napp-k get pod | grep druid-router'. Then run the command 'napp-k exec -it <druid-router name>  bash -- curl https://druid-router:8280/druid/router/v1/brokers -k'. The command result should return a non-empty list like '{"druid/broker":["xyx.yxy.y.yx:8282"]}'