Testing multi-token policies EDM
Example:
While testing an IBAN policy. I created indexes for IBAN numbers in 2 different formats, one without blanks, one with blanks.
CH660076xxxx1005xxxx3 CH66 0076 xxxx 1005 xxxx 3
Policy is created with rules to detect any of this two formats
I set up server configuration so every blank of second IBAN number in indexed content with WIP settings false (Lexer.
Index seems to be properly created
com.vontu.profileindexer.database.NativeStatisticsBuilder@6a4f787b
Cryptographic key used: EXTERNAL.1
Single Token Uncommon cells: 230322
, Single Token Uncommon cell lists: 0
, Single Token common cells: 0
, Single Token common cell lists: 0
, Multi Token Uncommon cells: 230322
, Multi Token Uncommon cell lists: 0
, Multi Token common cells: 0
, Multi Token common cell lists: 0,
Elapsed time: 5133 milliseconds.
Successfully created index
In every mail I put 3 different IBAN numbers in 2 different formats, without blanks and with blanks to be able to detect numbers using Multi-token punctuation characters. Almost all of them are working fine – see generated incidents
EXCEPT those 4, when using % $ dot and dash
It is detecting only IBAN format without blanks.
This multi-token policies with punctuation is important for us, because we mostly want to detect the IBAN numbers with dot and dash punctuation characters.
For the one in the second group:
CH66$0076$6000$2005$3066$3 gets most likely recognized as a group of currencies, $6000 $2005 $3066 $3 so it won't match
CH66%0076%6000%2005%3066%3 gets most likely recognized as a group of percentages 6000% 2005% 3066% so it won't match
I am not sure of CH66-0076-6000-2005-3066-3 as part of it might be recognized as telephone or credit card (the lexer gets quite complex so it's time-consuming to try to run it in your head, you would have to set up a debug system and look at how each token is parsed).
If the user needs all those combinations to be matched, the only workaround found thus far is to set Lexer.
Since the account numbers are fairly long and with a specific structure, you should consider using DIs? If the numbers also have to follow some logic you can add validators to reduce false positives.