Can a value be listed multiple times in an EDM and still be detected?
search cancel

Can a value be listed multiple times in an EDM and still be detected?

book

Article ID: 159488

calendar_today

Updated On:

Products

Data Loss Prevention Endpoint Prevent Data Loss Prevention Network Monitor Data Loss Prevention Network Prevent for Email Data Loss Prevention Enforce Data Loss Prevention Network Protect Data Loss Prevention Endpoint Discover

Issue/Introduction

Consider the following scenario:

There is an EDM that contains an email address for a team managers and their direct reports.  In testing, one manager has more than 10 direct reports, but his email address does not have a match, even though the email addresses for direct reports match.

Cause

The EDM indexing technology has two types of terms - common and uncommon. An EDM match always requires at least one uncommon term. Both uncommon and common terms will be indexed by Enforce, but EDM will behave differently in detection for them.

When evaluating a message, the EDM detection engine will first look for any occurrence of an uncommon term. For any possible candidate, it will then check the proximity of other uncommon or common terms and try to find combinations of terms that belong to the same row from the source file of the EDM index. This is required to have a balance between detection accuracy and performance - as proximity logic requires heavy processing to be ran, and because of that, the goal is to have the EDM detection only run it for match candidates which are uncommon in the source file. 

By default, a common term is defined as something that appears more than 10 times in the source file used for indexing. That count is controlled by the parameter term_commonality_threshold, defined in Indexer.properties configuration file. 

In this case, the manager's email address is listed in the EDM too many times, and therefore, Symantec DLP will not match on that common term (manager's email address) unless it is in close proximity to an uncommon term from the same index (and from the same row in the source data). 

Resolution

The common terms threshold of 10 can be increased in the Indexer.properties file, but should be reverted back to 10 in case of issues.

To configure, modify the following:

Indexer.properties

If a term appears in SDP more then this number of times,

# it is considered a common term with appropriate ramifications

# during SDP indexing and detection.

term_commonality_threshold=10