How to handle GPTEXT cluster replica imbalance
search cancel

How to handle GPTEXT cluster replica imbalance

book

Article ID: 296666

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

Sometimes a GPTEXT cluster may run into an imbalanced state. This causes some of the solr nodes to have more loads than others and potentially may cause that node to go offline due to out of memory (OOM) errors.


This article covers how to handle this situation GPTEXT cluster replica imbalance.


Environment

Product Version: 6.13

Resolution

How to find if the index is in an imbalanced state

Use the following query to check how many replicas you have for each shard on each host. Ideally, there should be only 1 replica from one shard on one host.

If you observe any replica_count greater than 1, this means the index is not in a balanced state.

# select count(replica_name) as replica_count, shard_name,  regexp_replace(node_name, ':.*', '', 'g')  as hostname from gptext.index_summary('demo.public.message') group by (shard_name,hostname) order by 2;
replica_count | shard_name | hostname
---------------+------------+----------
             1 | shard1     | sdw1
             1 | shard1     | sdw2
             1 | shard2     | sdw1
             1 | shard2     | sdw2
             1 | shard3     | sdw1
             1 | shard3     | sdw2
             1 | shard4     | sdw1
             1 | shard4     | sdw2


You may also observe that some nodes have more leader replicas than others. 

If that is the case, note that his is not an imbalance issue because when searching the index the load is sent to both the lead and follower, making it is evenly distributed.


For example:

select count(*),node_name from gptext.index_status('demo.public.message') where is_leader = 't' group by 2;
count |          node_name
-------+-------------------------------------------------
   7   |  sdw4:18984_solr
   4   |  sdw3:18984_solr
   7   |  sdw2:18983_solr
   4   |  sdw4:18983_solr
   1   |  sdw1:18983_solr
   1   |  sdw3:18983_solr
   8   |  sdw2:18984_solr
 

How to fix the imbalance issue

1. Make sure all solr nodes are up and the index is in the green state:

# gptext-state -D 


2. To view the plan of rebalancing, run this command:

# gptext-rebalance index --index <INDEX_NAME> --show_plan_only


3. To execute a rebalance, run this command. 

# gptext-rebalance index --index <INDEX_NAME>


Note: It might take a while to copy the data across the hosts, the time it needs to take depends on how much data we have in the index.

4. If the rebalance is interrupted halfway (for example, a network issue occurred or the network is not stable), follow the below steps to continue:

a. Wait for all solr nodes to come back and run this command to confirm they are up and running:

 # gptext-state -D

b. Rerun the following command to rebalance, it will continue from where it ended:

# gptext-rebalance index --index <INDEX_NAME>