How Do I Improve Accuracy for Large EDMs?

book

Article ID: 160480

calendar_today

Updated On:

Products

Data Loss Prevention Endpoint Prevent Data Loss Prevention Network Monitor Data Loss Prevention Network Prevent for Email Data Loss Prevention Enforce Data Loss Prevention Network Discover Data Loss Prevention Network Prevent for Web Data Loss Prevention Network Protect Data Loss Prevention for Tablets Data Loss Prevention Endpoint Discover

Issue/Introduction

Getting false positives with Exact Data Match (EDM) containing only SSN and Last Name.

An EDM with many rows and few columns will be less accurate.

Accuracy and performance for large EDM indexes can be improved by using only one large file.

Resolution

It is best to use one 300 MB EDM file rather than four 75 MB EDM files.  The performance is better with just one file. Each EDM has an overhead of about 200 MB of memory.  Splitting an EDM into multiple files will require an extra 200 MB of memory per file.

If you have many rows and too few columns that can cause false positives.  The best way to avoid this is to increase the number of columns. SymantecDLP recommends at least three columns to match on an EDM.

Accuracy improves with more columns, but the extra columns must be useful. If you index four columns, but only use two, there is no improvement, and performance can suffer from the large file.
Ensure that you use all the columns that you index.