Need the latest documentation and supported JARs to mask Hadoop data
search cancel

Need the latest documentation and supported JARs to mask Hadoop data

book

Article ID: 214226

calendar_today

Updated On:

Products

CA Test Data Manager (Data Finder / Grid Tools)

Issue/Introduction

We would like to do a PoC that will involve the following within Hadoop (HIVE and/or HDFS)
1. Create connections to Hadoop environments

2. Create data models for data in HDFS/HIVE

3. Tag mark PHI/PII data

4. Assign masking functions 

5. Execute Masking jobs 

etc. 

Can you point me to the latest sets of JAR files that we must install? The level of access the TDM person would require in the environment/Edge Node/server etc. 
Also, step-by-step process a user must go through including the interface details to mask data within Hadoop environment/s?

Would it be possible for someone to actively guide us to 'set up the environment and ensure the setup is correct? (Steps to be performed). 

Environment

Release : 4.9.1 Test Data Manager

Component : Hadoop Integration

Resolution

 

Looking at the Supported Data Sources - Non-Relational Data Sources, I see very limited support for Hadoop (Hive).
See https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/installing/supported-data-sources.html

  • Dynamic Test Data Reservation - TDM Portal = Not Supported
  • Data Generation - TDM Portal = Not Supported
  • Data Generation - Datamaker = Not Supported
  • Data Masking = Certified
  • Data Subsetting = Not Supported
  • Test Match = Not Supported
  • Virtual Test Data Management = Not Supported
  • Data Modelling and PII Audit = Not Supported

To help better set the expectations for you POC, and answer your specific questions:

  1. Create connections to Hadoop environments

    Answer - TDM components do not interface directly with the Hadoop environment. There are not any connection profiles in TDM that need to be configured. For more information regarding how to set up the Hadoop environment for masking, see https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/provisioning-test-data/mask-production-data-with-fast-data-masker/mask-stored-data/mask-data-stored-in-hadoop.html

  2. Create data models for data in HDFS/HIVE

    Answer - Not Supported. You do not need to create a data model in TDM Datamaker, or TDM Portal.

  3. Tag mark PHI/PII data

    Answer - Not Supported.

  4. Assign masking functions

    Answer - Hadoop masking is supported through the provided JAR files, which need to be deployed to the Hadoop environment. Steps for deploying the JAR files used for the supported masking functions are outlined in the TDM Documentation. See https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/provisioning-test-data/mask-production-data-with-fast-data-masker/mask-stored-data/mask-data-stored-in-hadoop.html

  5. Execute Masking jobs

    Answer - Masking is executed through the Hive UDFs that the deployed JAR files include to perform the supported masking functions. See https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/provisioning-test-data/mask-production-data-with-fast-data-masker/mask-stored-data/mask-data-stored-in-hadoop.html

  6. Can you point me to the latest sets of Jar files that we must install?

    Answer - The required JAR files are included in the MaskingSDK-<version>.zip, which is located in the root directory of your CA TDM Installation media. If you are running an older release of TDM 4.9.105.0, please open a support case requesting a copy of the latest MaskingSDK-<version>.zip file. At the time this KB was written, the latest release was MaskingSDK-4.9.105.0.zip.

  7. The level of access the TDM person would require in the environment/Edge Node/server etc.  Also, step-by-step process a user must go through including the interface details to mask data within Hadoop environment/s? Would it be possible for someone to actively guide us to 'set up the environment and ensure the setup is correct? (Steps to be performed).

    Answer - The JAR files are executed by the Hive UDFs (User Defined Functions) to perform the masking. The stored Hadoop data must be structured data and must have a defined schema. Basically,  the user executes the Hive UDF (provided JAR file) through the Hive query language and access the structured data stored in Hadoop. The Hive UDF executes the FDM masking function, which is provided in the masking library. The structured data is updated as a result of the masking function. The documentation covers how to deploy the JAR files, and documents which FDM masking functions correspond with which Hive UDF to execute for a given masking function. Therefore, everything is executed from the Hive and doesn't rely on a TDM connection profile for Portal or FDM.

    A step-by-step guide to setting up masking for your Hadoop environment, and help with the Hadoop masking is beyond the scope of support and would be something our Services partner HLC could help you with. You may want to reach out to your Broadcom account team to see if they can help arrange Professional Services or perhaps a Solutions Architect to meet and review your environment and POC requirements, and provide steps for setting this up in your environment, or you may want to reach out to the TDM Community to see if anyone has suggestions they can share for making Hadoop (hive) data.