The words in a key phrase in a PDF file are separated by spaces and the entire phrase is not detected by DLP.
The search function in the Adobe Acrobat Reader is able to find the entire key phrase in the PDF file.
When a file from another application is convert to PDF sometimes the PDF adds newlines and other formatting to the original text.
We know this is something that MS Word does to conform with the margin and document structure requirements that PDF has.
It can happen that one space is actually a Line Feed (LF) or New Line (\nl) or Carriage Return (CR) in the document which can be seen by extracting the PDF raw text using the DLP filter.exe to view the cracked content in a file editor which allows you to view symbols e.g. Notepad++.
In the case of a keyword phrase rule not detecting a phrase where a LF or new line was added in-between this is breaking up the original key phrase.
DLP would not detect that because DLP will extract the content as it is formatted in the scanned document.
Currently there is no way to distinguish between new lines originally present in the original document versus new lines added as a part of an export process.
To successfully detect in this situation you can do either: