search cancel

For a Regular Expression with the start of line character '^' it is not detecting content in a doc, docx or pdf documents.

book

Article ID: 173104

calendar_today

Updated On:

Products

Symantec Products

Issue/Introduction

You are using a RegEx rule in version 14.6.

You have written a regular expression using a circumflex (^) at the beginning of the statement.

You notice in testing that the start of line character '^' is not being detected in doc, docx and pdf documents although it works for pure text document. 

The regular expression fails to detect a simple '^INTERNAL' RegEx against a sample doc showing the word 'INTERNAL' at  the start of a line in each case.

For example in a .txt, docx, .doc, and .pdf file you have the following text: 

INTERNAL
LANRETNI
internal

The regular expressions used in testing were: 

Case 1:

With this RegEx, only the txt creates an alert:

Test original RegEx (Regular Expression): Match (?i)(^|\s{2,}|:)(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$)). Count all matches.

Case 2:

With this RegEx, none of the documents create an alert:

Test original RegEx (Regular Expression): Match (?i)^(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$)). Count all matches.

Case 3:

Again, none of the documents create an alert - this is the really simple RegEx case:

Test original RegEx (Regular Expression): Match (?i)^INTERNAL. Count all matches.

 

Cause

^ - beginning - matches the beginning of the string, or the beginning of a line if the multi-line flag (m) is enabled, therefore without the m modifier the '^INTERNAL' on its own will not match.

You will find this explained if you use any of the online tools (https://regexr.com/ or https://www.regexpal.com/) with the Engine to use PCRE Engine while evaluating the regular expression.

Resolution

Modify the RegEx adding (?m) for multi-line mode and the result is successful with two matches for each RegEx:

Case 1:

Match (?i)(?m)(^|\s{2,}|:)(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$)). Count all matches.

Case 2:

Match (?i)(?m)^(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$)). Count all matches.

Case 3:

Match (?i)(?m)^INTERNAL. Count all matches.