We are masking JSON file and are encountering a serious performance issue when there are multiple JSON objects in a single file. We have close to 2Million JSON objects in List/Array format in a single file and want to masking.
We masked JSON file with 10K JSON objects in array form in single file and it took 30min to mask which is very high.
Mask Snippet:
,,FORMATENCRYPT,,,,,Y,,,,,$[*]['Claim']['ClaimNumber'],3,10,,,,,,,,,E,,,,
The other approach we took is to split each json in to separate file and then mask each file and it completed in 2min which is very low. This was including splitting de-id and combining.
Mask Snippet:
,,FORMATENCRYPT,,,,,Y,,,,,$['Claim']['ClaimNumber'],3,10,,,,,,,,,E,,,,
Now since our objective is to mask 2M+ records, creating as many files is slowing down system IO. and we need a fix to mask data in single file and with better performance. We believe by using "*" there is some kind of recursion which is slowing down the masking and using a lot of memory.
Release : 4.10
By default, FDM will split the JSON file into Objects
By enabling the JASONPROCESSWHOLEFILE option (JASONPROCESSWHOLEFILE = Y), FDM will process the whole file in one go, which should help with performance.