Note: This is for version 12.0 and below. Version 12.5 introduces Multi-token EDMs.
When processing a document or message, text is broken into tokens. In most cases, one word becomes a single token.
When creating an EDM, if the data source has a value such as "United States," the value is indexed as one multi-word token. The processor is looking to match the whole string "United States”. On the detection side, if a message contains “United States” in non-tabular content, it is broken into two tokens, "United" and "States." The EDM token "United States," does not match "United" or "States" so it is not matched; hence, a false negative can result.
To improve EDM accuracy, when creating an EDM:
- Avoid using spaces, for example, use US instead
- Split names into First Name and Last Name, do not put full names in a single column
An exception to this rule is for specific data patterns. In these patterns, both the index and detection will recognize the patterns in the same way:
- Social Security Number
•using dashes 111-22-3333
•using spaces 111 22 3333
•no delimiters 111223333
- Credit Card Number - numerous patterns, each tailored to specific credit card issuers, such as Visa, Mastercard, American Express
•using dashes 4444-2222-1111-4444
•using spaces between each grouping, i.e. 4444 2222 1111 4444 or 44442222 11114444 or 444422221111 4444
•no spaces 4444222211114444
- Phone Number - currently only patterned for US or Canadian numbers
•using area codes with or without parentheses, i.e. (415)111-2222 or 415111-2222 or 415 111-2222
•no area codes, i.e. 111-2222
•without dashes 11112222
•using the US country code, i.e 1(415)111-2222
•spaces instead of dashes, 415 111 2222
•dots as separators 415.111.2222
It's important to note, the pattern that is indexed does not need to be used during detection. Therefore, indexing (415)111-2222 would match against 1 415 111 2222.