DLP Best practices for using Data and Document profiles
search cancel

DLP Best practices for using Data and Document profiles


Article ID: 174457


Updated On:


Data Loss Prevention Data Loss Prevention Enforce


Symantec Data Loss Prevention (DLP)

Exact Data Matching (EDM) is the most accurate form of detection. EDMs are also the most complex to set up and maintain. To ensure that your EDM policies are as accurate as possible, consider the recommendations in this document when you implement your EDM profiles and policies.

Indexed Document Matching (IDM) is designed to protect document content and images. IDM relies on an index of fingerprinted documents to perform partial and derivative text-based content matching. In addition, you can also use IDM to match indexed documents exactly, based on their binary stamp. Including not only text-based documents but also graphics and media files.

Due to the broad range of matching supported by IDM, consider the best practices in this document to implement the IDM policies that accurately match the data you want to protect.


EDM Best Practices:

  • Re-index EDM profiles after upgrade.
  • Ensure that the data source file contains at least one column of unique data.
  • Eliminate duplicate rows and blank columns before indexing.
  • To reduce false positives, avoid single characters, quotes, abbreviations, numeric fields with fewer than five digits, and dates
  • Understand multi-token indexing and clean up as necessary.
  • Use the pipe (|) character to delimit columns in your data source.
  • Review an example of a cleansed data source file.
  • Map data source column to system fields to leverage validation during indexing.
  • Leverage EDM policy templates whenever possible.
  • Include the column headers as the first row of the original data source file.
  • Check the system alerts to tune Exact Data Profiles.
  • Use stop words to exclude common words from matching.
  • Automate profile updates with scheduled indexing.
  • Match on two or three columns in an EDM rule.
  • Leverage exception tuples to avoid false positives.
  • Use a where clause to detect the records that meet a specific criterion.
  • Use the minimum matches field to fine-tune EDM rules.
  • Consider using Data Identifiers in combination with EDM rules.
  • Include an email address field in the Exact Data Profile for profiled DGM.
  • Use profiled DGM for Network Prevent for Web identity detection.

IDM policy best practices:

  • Re-index IDM profiles after upgrade.
  • Do not compress any documents that contain content you want to fingerprint.
  • Prefer partial matching over exact matching on the DLP Agent.
  • Only index the text-based documents that have content.
  • Be aware of the limitations of exact matching
  • Use white listing to exclude partial file contents from matching and reduce false positives.
  • Filter non-critical documents from indexing to reduce false positives.
  • Use remote indexing for large document sets.
  • Create separate collections for each set of documents over 1,000,000 files, with all files in their unencapsulated, uncompressed state
  • Set up separate IDM profiles if indexing more than 1,000,000 documents. You can change the index max size per IDM profile to index more than 1,000,000 documents. However, it is usually less resource-intensive to set up separate profiles.
  • Use scheduled indexing to automate profile updates.
  • Use multiple IDM rules in parallel to establish and tune match thresholds.
  • Ensure that you have appropriate hardware resources if indexing many files. You need an additional 2GB of RAM per 1,000,000 files on each detection server.

Data and Document Profiles in the Cloud Best Practices:

Best Practices for these Profiles in the cloud are the same as on-premises detection servers. With Cloud Detectors, however, all two-tier indexes must be free from any errors, at least for the first profile that is uploaded to a new Detector. This caution also includes Active Directory indexes, which are stored as an EDM Profile.