We have been having trouble scanning a few files, they are PDFs. We have scanned it on premise with the SharePoint connector with OCR enabled. We run the filter.exe against the file and it completes but the output file is blank. We scanned another document to make sure there wasn't anything wrong with the program itself and it returned an expected result.
Release : 15.8, 16.0
The OCR extraction method used by DLP can extract image content created by Acroforms:
PDF content created by other methods (e.g., "XFA") will not allow the DLP OCR engine to extract a readable image.
If the OCR engine finds no images at all it's either due to image quality and size requirements (see Image Quality and Resolution for OCR results (broadcom.com)).
However, in some cases the type of PDF involved will also prevent image extraction - e.g., "XFA" (XML Forms Architecture).
Thus, a form created by XFA might include the following document properties (viewed by Acrobat Reader "File > Properties" menu):
There is a Feature Request for this issue, PM-2963: "Support content extraction for XFA-based PDF forms".