Sometimes a GPTEXT cluster may run into an imbalanced state. This causes some of the solr nodes to have more loads than others and potentially may cause that node to go offline due to out of memory (OOM) errors.
This article covers how to handle this situation GPTEXT cluster replica imbalance.
Use the following query to check how many replicas you have for each shard on each host. Ideally, there should be only 1 replica from one shard on one host.
If you observe any replica_count greater than 1, this means the index is not in a balanced state.
# select count(replica_name) as replica_count, shard_name, regexp_replace(node_name, ':.*', '', 'g') as hostname from gptext.index_summary('demo.public.message') group by (shard_name,hostname) order by 2; replica_count | shard_name | hostname ---------------+------------+---------- 1 | shard1 | sdw1 1 | shard1 | sdw2 1 | shard2 | sdw1 1 | shard2 | sdw2 1 | shard3 | sdw1 1 | shard3 | sdw2 1 | shard4 | sdw1 1 | shard4 | sdw2
You may also observe that some nodes have more leader replicas than others.
If that is the case, note that his is not an imbalance issue because when searching the index the load is sent to both the lead and follower, making it is evenly distributed.
For example:
select count(*),node_name from gptext.index_status('demo.public.message') where is_leader = 't' group by 2; count | node_name -------+------------------------------------------------- 7 | sdw4:18984_solr 4 | sdw3:18984_solr 7 | sdw2:18983_solr 4 | sdw4:18983_solr 1 | sdw1:18983_solr 1 | sdw3:18983_solr 8 | sdw2:18984_solr
1. Make sure all solr nodes are up and the index is in the green state:
# gptext-state -D
2. To view the plan of rebalancing, run this command:
# gptext-rebalance index --index <INDEX_NAME> --show_plan_only
3. To execute a rebalance, run this command.
# gptext-rebalance index --index <INDEX_NAME>
Note: It might take a while to copy the data across the hosts, the time it needs to take depends on how much data we have in the index.
4. If the rebalance is interrupted halfway (for example, a network issue occurred or the network is not stable), follow the below steps to continue:
a. Wait for all solr nodes to come back and run this command to confirm they are up and running:
# gptext-state -D
b. Rerun the following command to rebalance, it will continue from where it ended:
# gptext-rebalance index --index <INDEX_NAME>