EMDI Policy false positives
search cancel

EMDI Policy false positives

book

Article ID: 199060

calendar_today

Updated On:

Products

Data Loss Prevention Enterprise Suite Data Loss Prevention Endpoint Prevent Data Loss Prevention Data Loss Prevention Core Package

Issue/Introduction

An EMDI (Exact Match Data Identifier) policy is only generating matches on the regex pattern (the Custom Data Identifier created for EMDI specifically), instead of the EMDI indexed data.

Cause

Because EMDI is intended for Endpoint Agents, there is a cap on "calls" to the EMDI index for performance reasons, which is essentially the number of matches to the Custom DI in any given file submitted for detection.

By default, the max number of EMDI lookups is 10000.
The is set in the agent configuration > Advanced Settings tab, in the "Detection.MAX_EMDI_LOOKUPS.int" setting.
Be advised that:
"Increasing the limit above the default value of 10000 increases the likelihood of false positives and performance degrades linearly. For example, a setting of 20000 is twice as slow as a setting of 10000."

When this max is reached, the Endpoint Agent log should show the following entry:

Level: WARNING
Source: Detection.EMDICheck
Message: Reached the max number of EMDI lookups(10000)

If the number of matches to the Custom Data Identifier exceeds this amount, no calls will be made to the EMDI component index - and ALL DI matches beyond that number will be part of the incident.
If you have deployed 50 EMDI validators, only the first 200 DI matches will be examined and possibly returned as an incident.
However, the policy will continue to scan the file and all DI matches beyond that point will be included in the incident.

For example, a Custom DI that is only looking for 5 digit number strings will frequently generate a lot of matches. This will exceed the cap, and create what will appear to be false positives (data not part of the indexed EMDI profile).

Resolution

The Custom DI tied to an EMDI policy should be tuned to more closely match the requirements of the dataset being analyzed.