We need to obfuscate over 400,000 unique credit card numbers. We are doing this via tool obfuscation.
We have a seed table within the GTSRC_REFERENCE_LOV1 table with a RL_REF_ID of EJCREDITCARDS and it used to contain 250,000 rows
When we ran the credit card obfuscation we encountered a large number of duplicate credit card values, which we attributed to having more unique credit card numbers than we had obfuscations for them (400,000 vs 250,000).
We wrote a program to generate 1,000,000 unique obfuscated credit card values and we removed the old values from the GTSRC_REFERENCE_LOV1 table and inserted our new values.
Here is a list of the first 10 and last 10 rows in the seed table GTSRC_REFERENCE_LOV1.
PCAGRID.GTSRC_REFERENCE_LOV1
PROD ----------FETCH STATUS: COMPLETE--------------------------
RL_REF_ID RL_RN RL_TOTAL RL_REF_VALUE
EJCREDITCARD 1 1000000 7000000000012349
EJCREDITCARD 2 1000000 7000000000024682
EJCREDITCARD 3 1000000 7000000000037023
EJCREDITCARD 4 1000000 7000000000049366
EJCREDITCARD 5 1000000 7000000000061700
EJCREDITCARD 6 1000000 7000000000074042
EJCREDITCARD 7 1000000 7000000000086384
EJCREDITCARD 8 1000000 7000000000098728
EJCREDITCARD 9 1000000 7000000000111067
EJCREDITCARD 10 1000000 7000000000123401
EJCREDITCARD 999991 1000000 7000012339888943
EJCREDITCARD 999992 1000000 7000012339901282
EJCREDITCARD 999993 1000000 7000012339913626
EJCREDITCARD 999994 1000000 7000012339925968
EJCREDITCARD 999995 1000000 7000012339938300
EJCREDITCARD 999996 1000000 7000012339950644
EJCREDITCARD 999997 1000000 7000012339962987
EJCREDITCARD 999998 1000000 7000012339975328
EJCREDITCARD 999999 1000000 7000012339987661
EJCREDITCARD 1000000 1000000 7000012340000009
We believe we have entered the data correctly with an incremental number in the RL_RN column and the total number of obfuscation seed values in the RL_TOTAL column,
When we ran the tool obfuscation we still had a significant number of duplicates.
Output from our job below
JOB OBMJBFB(JOB04280) SUBMITTED RC=0
JOB OBLJBFB(JOB04281) SUBMITTED RC=4
DCFP0001.CARD_31811
DCFP0001.A31811X1
155342 DUPLICATE KEY ERRORS
In addition, I saw the following in the SYSOUT for the job which makes me think that the tool is only ingesting 250,000 rows from the seed table (the previous amount).
Program name GTXDEF
Program version 6.2.00
Program date 2020-09-01
Program Name GTXWHR
Program Version 6.2.00
Program Date 2020-09-01
MAPCSV Out:
Passed (80) =>Table,Column,Function,Parm1,Parm2,Parm3,Parm4,Parm5,Parm6,Parm7,Parm8,Parm9,Parm
Passed (80) =>CARD_31811,CARD_NO,HASHLOV,EJCREDITCARD,1,,,,,,,,,Y,,,,,,,,,,,,,
Run Type ==> F
File Type ==> F
File Format ==> X
WHERE ==> Y
Bundle IDs ==> 00000
Sort Cards ==> 00004
MAPCSV Inp ==> 00002
Header ==> 00001
Blank ==> 00000
Comment ==> 00000
Standard ==> 00001
MAPCSV SUB ==> 00002
MAPCSV OUT ==> 00002
MAPLEN Out ==> 01000
Ignored ==> 00000
Replaced ==> 00000
Generated ==> 00000
Program name GTXMAP
Program version 6.2.01
Program date 2020-09-16
Program name GTXMSKF
Program version 6.2.04
Program date 2020-11-05
Processed 000250000 records 08:48:46 – this was the old number of Credit Card entries – I don’t think its picking up the whole 1,000,000 credit card numbers ???
ACF0C038 ACF2 LOGONID ATTRIBUTES HAVE REPLACED DEFAULT USER ATTRIBUTES
READY
DSN SYSTEM(PROD)
DSN
RUN PROGRAM(GTXMSKF) PLAN(GTXMSKF) LIB('SDB3ETDM.P.LOADLIB')
DSN
END
READY
Is there some parm set somewhere where it is still instructing the program that we only have 250,000 seed table rows for EJCREDITCARD?
Release : 6.2
TDM Mainframe.
N/A
HASHLOV/HASHLOV1 by itself does not guarantee unique values.
If you want uniqueness, then you’ll need to create a lookup/cross-reference table.
One column containing the “original” number, the 2nd column containing the “obfuscated” numbers.
You can implement this two ways:
(1) Using the existing GTSRC_XREF and specifying masking routines that leverage XREF
(2) Using your own table and using SQLFUNCTION masking to SELECT the obfuscated value based on the original.
We do have a recent “reflov” function that would have worked, but unfortunately, that has not been implemented on Mainframe yet…
https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/reference/masking-functions-and-parameters.html