An EMDI(Exact Match Data Identifier) policy is only generating matches on the regex pattern (the Custom Data Identifier created for EMDI specifically), instead of the EMDI indexed data.
Because EMDI is intended for Endpoint Agents, there is a cap on "calls" to the EMDI index for performance reasons, which is essentially the number of matches to the Custom DI in any given file submitted for detection.
That cap is 2500, and is hardcoded (not configurable) as of DLP 15.7. In addition, if there is more than 1 EMDI validator on the Data Identifier than the 2500 max number will be divided that number of times. So if you were to have 3 EMDI validators in your policy condition, then the total number of lookups would be 2500 / 3 = 833.
When this max is reached, the Endpoint Agent log should show the following entry in FINEST level:
**** WARNING | Detection.EMDICheck | Reached the max number of EMDI lookups(2500) | EMDICheck.cpp(124) ****
If the number of matches to the Custom Data Identifier exceeds this amount, no calls will be made to the EMDI component index - and ALL DI matches beyond that number will be part of the incident.
For example, a Custom DI that is only looking for 5 digit number strings will frequently generate a lot of matches. This will exceed the cap, and create what will appear to be false positives (data not part of the indexed EMDI profile).
Release : 15.7
Component : Symantec Data Loss Prevention Endpoint Agent and Detection Servers
The Custom DI tied to an EMDI policy should be tuned to more closely match the requirements of the dataset being analyzed.
There is an enhancement to this feature, coming in a future release, to allow the cap on matches to be increased.