How to implement a lucene index on array field
search cancel

How to implement a lucene index on array field

book

Article ID: 294392

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

When running a search query on an array field via lucene index, you may fail to get the expected result due to some error or exception. This KB will describe the correct way to implement a lucene index on an array field.

For example:
gfsh>search lucene --name=arrayIndex --region=/exampleRegion --queryString="exampleIdentifier.systemIds[0].system:CDE" --defaultField="exampleIdentifier.systemIds[0].system"

Error while processing command <search lucene --name=arrayIndex --region=/exampleRegion --queryString="exampleIdentifier.systemIds[0].system:CDE" --defaultField="exampleIdentifier.systemIds[0].system"> Reason : class java.lang.NullPointerException cannot be cast to class java.util.Set (java.lang.NullPointerException and java.util.Set are in module java.base of loader 'bootstrap')
gfsh>show log --member=gemfire1-server-0
[info 2021/03/05 06:43:33.125 GMT <Function Execution Processor5> tid=0x12f] Unexpected exception during function execution on local node Partitioned Region
org.apache.geode.cache.execute.FunctionException: org.apache.geode.cache.lucene.LuceneQueryException: Parsing query 'exampleIdentifier.systemIds[0].system:CDE' failed due to: Syntax Error, cannot parse exampleIdentifier.systemIds[0].system:CDE:
	at org.apache.geode.cache.lucene.internal.distributed.LuceneQueryFunction.getQuery(LuceneQueryFunction.java:220)
	at org.apache.geode.cache.lucene.internal.distributed.LuceneQueryFunction.execute(LuceneQueryFunction.java:136)
	at org.apache.geode.cache.lucene.internal.distributed.LuceneQueryFunction.execute(LuceneQueryFunction.java:81)
	at org.apache.geode.internal.cache.execute.AbstractExecution.executeFunctionLocally(AbstractExecution.java:328)
	at org.apache.geode.internal.cache.execute.AbstractExecution.lambda$executeFunctionOnLocalPRNode$0(AbstractExecution.java:273)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442)
	at org.apache.geode.distributed.internal.ClusterOperationExecutors.doFunctionExecutionThread(ClusterOperationExecutors.java:377)
	at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
	at java.base/java.lang.Thread.run(Thread.java:834)

Caused by: org.apache.geode.cache.lucene.LuceneQueryException: Parsing query 'exampleIdentifier.systemIds[0].system:CDE' failed due to: Syntax Error, cannot parse exampleIdentifier.systemIds[0].system:CDE:
	at org.apache.geode.cache.lucene.internal.StringQueryProvider.getQuery(StringQueryProvider.java:79)
	at org.apache.geode.cache.lucene.internal.distributed.LuceneQueryFunction.getQuery(LuceneQueryFunction.java:217)
	... 10 more


Environment

Product Version: 9.10

Resolution

Here is a working example of lucene index implementation on an array field.

Step 1:

Implement the domain classes.
  • ExampleRegion.java
  • ExampleIdentifier.java
  • SystemId.java
package examples;

import java.io.Serializable;

public class ExampleRegion implements Serializable {
  private ExampleIdentifier exampleIdentifier;
  public ExampleRegion() {}
  public ExampleRegion(ExampleIdentifier exampleIdentifier) {
    this.exampleIdentifier = exampleIdentifier;
  }
  public void setExampleIdentifier(ExampleIdentifier exampleIdentifier) {
    this.exampleIdentifier = exampleIdentifier;
  }
  public ExampleIdentifier getExampleIdentifier() {
    return this.exampleIdentifier;
  }
}
package examples;

import java.io.Serializable;

public class ExampleIdentifier implements Serializable {
  private String exampleId;
  private SystemId[] systemIds;
  public ExampleIdentifier() {}
  public ExampleIdentifier(String exampleId, SystemId[] systemIds) {
    this.exampleId = exampleId;
    this.systemIds = systemIds;
  }
  public void setExampleId(String exampleId) {
    this.exampleId = exampleId;
  }
  public void setSystemIds(SystemId[] systemIds) {
    this.systemIds = systemIds;
  }
  public String getExampleId() {
    return this.exampleId;
  }
  public SystemId[] getSystemIds() {
    return this.systemIds;
  }
}
package examples;

import java.io.Serializable;

public class SystemId implements Serializable {
  private String id;
  private String system;
  public SystemId() {}
  SystemId(String id, String system) {
    this.id = id;
    this.system = system;
  }
  public void setId(String id) {
    this.id = id;
  }
  public void setSystem(String system) {
    this.system = system;
  }
}

Step 2:

Build a jar file, such as lucene_example-0.0.1.jar and then deploy it to a running Gemfire cluster:
deploy --jar=/Users/user1/build/libs/lucene_example-0.0.1.jar

Step 3:

Create a lucene index (name=arrayIndex) before creating the region (name=exampleRegion).
create lucene index --name=arrayIndex --region=/exampleRegion --field=exampleIdentifier.systemIds.system --analyzer=DEFAULT --serializer=org.apache.geode.cache.lucene.FlatFormatSerializer

Step 4:

Create a partition region (name=exampleRegion) and put some sample data into this region:
create region --name=exampleRegion --type=PARTITION
 
put --region=/exampleRegion --key=1 --value="('exampleIdentifier': {'exampleId': '000000000','systemIds':[{'id': 'ID','system': 'ABC'}]})" --value-class="examples.ExampleRegion"
 
put --region=/exampleRegion --key="('id':'2')" --value="('exampleIdentifier': {'exampleId': '000000001','systemIds':[{'id': 'ID','system': 'ABC'}, {'id': 'ID1','system': 'CDE'}]})" --value-class="examples.ExampleRegion"

Step 5:

Search the array field via lucene index by a correct syntax: exampleIdentifier.systemIds.system instead of exampleIdentifier.systemIds[0].system.
gfsh>search lucene --name=arrayIndex --region=/exampleRegion --queryString="exampleIdentifier.systemIds.system:CDE" --defaultField="exampleIdentifier.systemIds.system"

   key     |              value              | score
---------- | ------------------------------- | ----------
{'id':'2'} | examples.ExampleRegion@620c60d6 | 0.25811607