You are using a RegEx rule in version 14.6.
You have written a regular expression using a circumflex (^) at the beginning of the statement.
You notice in testing that the start of line character '^' is not being detected in doc, docx and pdf documents although it works for pure text document.
The regular expression fails to detect a simple '^INTERNAL' RegEx against a sample doc showing the word 'INTERNAL' at the start of a line in each case.
For example in a .txt, docx, .doc, and .pdf file you have the following text:
INTERNAL
LANRETNI
internal
The regular expressions used in testing were:
Case 1:
With this RegEx, only the txt creates an alert:
Test original RegEx (Regular Expression): Match (?i)(^|\s{2,}|:)(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$))
. Count all matches.
Case 2:
With this RegEx, none of the documents create an alert:
Test original RegEx (Regular Expression): Match (?i)^(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$))
. Count all matches.
Case 3:
Again, none of the documents create an alert - this is the really simple RegEx case:
Test original RegEx (Regular Expression): Match (?i)^INTERNAL
. Count all matches.
^ - beginning - matches the beginning of the string, or the beginning of a line if the multi-line flag (m) is enabled, therefore without the m modifier the '^INTERNAL' on its own will not match.
You will find this explained if you use any of the online tools (https://regexr.com/ or https://www.regexpal.com/) with the Engine to use PCRE Engine while evaluating the regular expression.
Modify the RegEx adding (?m) for multi-line mode and the result is successful with two matches for each RegEx:
Case 1:
Match (?i)(?m)(^|\s{2,}|:)(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$))
. Count all matches.
Case 2:
Match (?i)(?m)^(INTERNAL(\s{2,}|\s*[^\w\s.,:]|\t|$))
. Count all matches.
Case 3:
Match (?i)(?m)^INTERNAL
. Count all matches.