Fast Data Maker - HASHLOV is not selecting consistent masking values from the seed list
search cancel

Fast Data Maker - HASHLOV is not selecting consistent masking values from the seed list

book

Article ID: 258793

calendar_today

Updated On:

Products

CA Test Data Manager (Data Finder / Grid Tools)

Issue/Introduction

We are using a seed list containing multiple columns for masking in FDM. FDM is only using a single column from the seed list and not using the other columns.

Example: seed list with addresses split into multiple columns such as street number, street name , postal code, city , province , country etc)

When we run the masking job using HASHLOV, we would like to keep the results are consistent and chosen from the same seedlist row. However, when running our masking job, the results are chosen from different rows in the seedlist. For example, the Postal code is from Germany, City is from the UK, Province is from Canada, and the Country shows as US. 

 

Environment

Release : 4.10

Cause

The reason you were seeing the inconsistent selection of values from the seedlist was due to not defining a common Hash column originally in Parm3, which resulted in the HASHLOV function hashing on the column name of the column that you were masking. For example, The Address column used its own column name to calculate the Hash index, Postal Code used its own name, etc. This resulted in different hash indexes and different selections for each column.

The Parameters used with HASHLOV are as follows:

  • PARM1 (Mandatory) = the seedlist name from GTSRC_REFERENCE_LOV1.RL_REF_ID
  • PARM2 (Optional) = Integer between 1 and 30 (default 1). This identifies which column of data is returned (RL_REF_VALUE to RL_REF_VALUE30)
  • PARM3 (Optional) = Field name that contains the value to be hashed. The default is the field to be masked.

Resolution

We build the Hash index value based on the Hash key and the columns you select to hash on. The column value will determine the hash index and dictate the seed column to use when masking. The HASHLOV masking function will guarantee consistency, but not uniqueness. Even if you have a very large seedlist, there is still a slim chance that different values will return the same Hash index. Using multiple columns to build the hash index provides more variables when hashing. For example, if you have a seedlist that contains First Name, Middle Initial, and Last Name, and you use all three columns for hashing, then John D. Doe and John R. Doe, have a greater chance of being recognized as separate individuals, and masked as two different people. 

Since your use case has duplicate data in the input file, it would be better not to use a column with unique values, as this will return unique masked values in every row, which is not what you want. You should choose a column, or a combination of columns, that best represent the identification of your input data as your Hash Column(Parm3), and use the same value in Parm3 for all the fields represented in the seedlist.

Additional Information

For more information, you can find a list of the Masking Functions, along with a description, and the parameters at https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-10/reference/masking-functions-and-parameters.html