Regular expressions can cause poor performance with Symantec Data Loss Prevention (DLP), especially poorly written ones. Learn how to create more efficient regular expressions.
Regular expressions are much slower than Data Identifiers, so use a Data Identifier whenever possible. If a Data Identifier does not fit the needs of a particular policy, regular expressions are still available.
+ | Following a regular expression means 1 or more |
- | Range; Example [a-z] |
* | Following a regular expression means any number |
? | Following a regular expression means 0 or 1 |
\ | Escape; Example \. \* \+ \? |
\d | Any digit character (0-9) |
\D | Non-digit character |
\w | Word character (a-z, A-Z, 0-9, _) |
\W | Non-word character |
\s | Any White Space |
\S | Any Non-White Space |
[ ] | Character Class Brackets |
[a-z] | Lower Case Alphabet |
[A-Z] | Upper Case Alphabet |
[%*.#$%@-] | Symbols (Exact match within Square Brackets) |
^ | Within a Character Class, negates the elements within |
(?: ) | Groups regular expressions together |
(?i) | Case Insensitive |
(?u) | Makes a period (.) match even newline characters |
| | Pipe Character; Means OR |
(?=(?:[^-\w])|$ | Enhanced Look Ahead (DLP 14.6) |
(?<=(^|(?:[^)+\d][^-\w+]))) | Enhanced Look Behind (DLP 14.6) |
(?<=(^|(?:[^)+\d][^-\w+])|\t)) | Enhanced Look Behind (DLP 14.6) |
Summary: This pattern match between 0 and 10 characters. It will match on filereader.txt, but not on filewaytoolongofaname.txt
Best Practice: file{0,10}.txt
Not Best Practice: file*.txt
Notes: This expression is looking at the beginning or end of the body part. So, for a header, it would be looking at the beginning of the message header. Be aware of this or it may not provide the results you were expecting.
Begin: (?<=(^|(?:[^)+\d][^-\w+])))
End: (?=(?:[^-\w])|$)
(?<=(^|(?:[^)+\d][^-\w+])))\d\d\(?=(?:[^-\w])|$)
Below are some links that discuss performance and provide some ideas on how to improve your Regular Expressions:
PCRE - Perl Compatible Regular Expressions
Runaway Regular Expressions: Catastrophic Backtracking
Regex Optimization Using Atomic Grouping
Some Basic guidelines how to optimize RegExes:
http://www.javaworld.com/javaworld/jw-09-2007/jw-09-optimizingregex.html
http://blog.stevenlevithan.com/archives/greedy-lazy-performance
For more information see "Detecting content using regular expressions" chapter in the Data Loss Prevent Administration Guide.