Performance issue while fetching data from ElasticSearch
search cancel

Performance issue while fetching data from ElasticSearch

book

Article ID: 386515

calendar_today

Updated On:

Products

VMware Smart Assurance

Issue/Introduction

Reports which fetch data from ElasticSearch (ES) DB is incomplete and also performance issue wwhile generating such reports. Below exception is seen in ES logs:

org.elasticsearch.transport.RemoteTransportException: [sYhmb3X][##.###.###.###:9300][indices:data/read/search[phase/query]]

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TcpTransport$RequestHandler@18223c6a on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@13b85ec1[Running, pool size = 7, active threads = 7, queued tasks = 1000, completed tasks = 858716045]]

    at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50) ~[elasticsearch-5.2.0.jar:5.2.0]

    at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_321]

    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_321]

Environment

Watch4net|M&R - 7.7

Cause

Above exception shows that ES is not able to handle those many requests. ES cluster has too many tasks to handle, ES task queue is constantly full and the node will reject new tasks until its task queue drops below 1000 (the default max).

Resolution

Basically there are three choices here:

  1. Reduce the workload.
  2. Add more nodes to the cluster to share the workload.
  3. Get stronger hardware.

The first option may not always be feasible, if so need to grow the clusters work capacity by either adding more nodes or improving the hardware. In the above error ES pool size is just 7 which indicates a 4 CPU hardware. In general, ES will use a pool size that is (1.5 x number of cores) + 1. For instance, if there are 24 CPUs in the host the pool size will be 37, giving 37 worker threads to handle the queued tasks.

 

NOTE:

As a workaround queue_size can be increased using below steps, but increasing it to high value might affect the query performance. Other option as mentioned above is to add cluster nodes or increase hardware resources.

1). Check the current queue_size, it should be 1000: 

curl -XGET http://<ES_IP>:9200/_nodes/thread_pool

2). Take a backup of the file /APG_HOME/Databases/Elasticsearch/Default/conf/elasticsearch.yml 

3). Add below parameter at the end of above file to increase queue size for example to 2000:

 thread_pool.search.queue_size: 2000

4). Restart ES service.

5). Verify if queue_size is updated to 2000: 

curl -XGET http://<ES_IP>:9200/_nodes/thread_pool