GemFire: Optimally designed hashcodes helps to drive data balance for PRs
search cancel

GemFire: Optimally designed hashcodes helps to drive data balance for PRs

book

Article ID: 294459

calendar_today

Updated On:

Products

VMware Tanzu Gemfire

Issue/Introduction

There are various pieces of the puzzle to make sure GemFire systems using Partitioned Regions (PRs) are achieving optimal data balance and performance.

We have many KB articles related to making sure you are balancing your data, performing gfsh>rebalance when necessary.    This article serves to fill in one piece of the puzzle related to customer designed hashcodes.

Regardless of using gfsh>rebalance, and balancing buckets, or having the optimal number of buckets, it would also prove to be very insufficient if using hashcodes that are badly designed.

Thus, the question is as follows:  What can customers do to get the best balance they can, with respect to designing hashcodes for their Partitioned Regions?

Environment

Product Version: 9.15
OS: ALL

Resolution

To best understand this requires understanding the basic math involved.  Simply put, the customer selects a key to be used for a given region.    Then, you have to design your hashcode.  This is simply a mechanism to take that key, and convert it to some number.

This conversion is very important.   It must create a very random distribution.  For example, if your "black box" hashcode always pops out the value of 24, regardless of the key, every entry is going to end up in the same 1 GemFire Partitioned Region bucket.

This is worst case.   If you create a great distribution of values, up to some very large value of N, you likely have the best change of seeing a great distribution of entries across all of your total-num-buckets Buckets.   Here is a simple illustration:

KeyToBucketMapping

The perfect hashcode would result in a distribution where you end up with the same number of keys ending up in each bucket.     For the default total-num-buckets of 113, for example, if the hashcode black box pops out a 1, or a 114, both of those will end up in bucket 1.   1 modulo 113 = 1.   114 modulo 113 = 1.    Any key that results in a value of N popped out that gives  N modulo 113 = 1  will end up in that same bucket.

When designing hashcodes, understand your keys, and understand the distribution it will create from 0 - N.    This can be tested and studied.   Once you have a great distribution from 0 to N, you will then have a great distribution across your GemFire PR buckets, independent of the choice of total-num-buckets.