How to ensure consistent hash function output across different GDPB major release.
search cancel

How to ensure consistent hash function output across different GDPB major release.

book

Article ID: 296795

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

For some use case, it's necessary to ensure consistent hash function output for the same input id. However, due to postgresql design, hash function output is different among each major postgresql release, say postgres 8 and postgresql 9.  So in GPDB, hash function output is also different between GPDB v5 and v6.
In GPDB v5:
gpadmin=# select version();
                                                                                               version

----------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------
 PostgreSQL 8.3.23 (Greenplum Database 5.29.0 build commit:4c466dc4c01f3685d1118df86b7473a84c4e0e9f) on x86_64-pc-linux-gnu,
 compiled by GCC gcc (GCC) 6.2.0, 64-bit compiled on Sep 30 2021 02:00:30
(1 row)
gpadmin=# select hashtext('abc');
  hashtext
------------
 1542277287
(1 row)

In GPDB v6:
gpadmin=# select version();
                                                                                                version

----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
 PostgreSQL 9.4.26 (Greenplum Database 6.20.5 build commit:bd00a8103dd57c80bd98192b5b3516fd5179912b) on x86_64-unknown-linux
-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Jun  1 2022 22:53:46
(1 row)

gpadmin=# select hashtext('abc');
  hashtext
------------
 -785388649
(1 row)


Environment

Product Version: 6.20

Resolution

For those use case where it's necessary to have consistent hash function output (for same input id), we can create a user defined hash function to replace the system hash function.  Or we can use a 3rd party extension  to create a new function.
https://github.com/petere/pgvihash
  1. Download the extension file from above link.
  2. put the file at the master, unzip
  3. as root, cd to this directory, make sure GCC package already installed.  
  4. as root,  run "export PATH=$PATH:$GPHOME/bin/"  , next  "make"
  5. make sure below file exist at master node, if not, copy them from unzip directory to below location.
$GPHOME/share/postgresql/extension/vihash.control
$GPHOME/lib/postgresql/vihash.so
       6. scp both file to each segment host.
       7. For GPDB v6, psql,  then :  create extension vihash.  For GDPB v5, psql, then run below :
CREATE OR REPLACE FUNCTION vihashtext(text) RETURNS int
AS '$libdir/vihash'
STRICT IMMUTABLE
LANGUAGE C;
         8. verify the function :  psql ,  select vihashtext('abc');