The DLP Network Prevent for Email Detection Server does not mark some PDF's as protected when running detection on PDF's that are not fully encrypted or password protected, only portions of their content are protected. The same file is correctly marked as protected when detection is run locally by the DLP Endpoint Agent.
Running the PDF through filter.exe reports the file as PasswordProtected.
Enabling CEH logging on the Email Detection Server shows the PDF is marked as not encrypted (isEncrypted: 0) by the ImageExtractorPlugin:
05/28/25 16:54:13 | WARN | cehost | FileTypeIdentifierRequestExecutor [1276] | [2596] | Doing file type identification with: ImageExtractorPlugin | C:\Git\dlp-detection-core-native\ContentExtractionAPI\CEHost\FileTypeIdentifierRequestExecutor.cpp (200)
05/28/25 16:54:13 | WARN | cehost | FileTypeIdentifierRequestExecutor [1276] | [2596] | ImageExtractorPlugin: identified the stream as: pdf, isEncrypted: 0 | C:\Git\dlp-detection-core-native\ContentExtractionAPI\CEHost\FileTypeIdentifierRequestExecutor.cpp (223)
FileReader logs show an image was extracted from the PDF during detection:
May 28, 2025 4:54:14 PM com.symantec.dlp.imagepreclassifier.NativeImagePreclassificationProvider shouldPerformOcr
INFO: [8824] Image id image_extractor_plugin_embedded_image with size #### from message id ##### classified as colorType: RGB_IMAGE_DARK C:\VontuDev\workDir\ImagePreclassifier\ImagePreclassifier\ImagePropertyProviderImpl.cpp 230
Symantec Data Loss Prevention 16.x
OCR
When OCR is enabled on the Detection Server, ImageExtractorPlugin will run first when evaluating file types. If an image is detected and extracted from the non-encrypted portions of the PDF, ImageExtractorPlugin will flag the entire file as not encrypted. If no images are detected, then the file will run through normal detection and be flagged as protected.
The PDF is correctly flagged as a protected/encrypted file with Endpoint Agent detection because OCR does not run on endpoints.
Working as designed