OCR Unknown Document Format
search cancel

OCR Unknown Document Format


Article ID: 242863


Updated On:


Data Loss Prevention Network Monitor and Prevent for Email and Web Data Loss Prevention Enterprise Suite Data Loss Prevention Form Recognition


There are a lot of incidents that show that the OCR server is unable to detect the format.

A lot of image files (.png, .jpg, .gif) are identified as Unknown Document Format


On investigation, these images generally turn out to be very small images which are often found in email signatures.




These files are too small for the OCR server to perform reasonable detection on.


This issue is known and will likely be fixed in a future release. However, in the interim, it may be possible to avoid the issue by increasing the minimum file size setting found in the ImageRecognition.properties file on the server [<dir>:\Program Files\Symantec\DataLossPrevention\DetectionServer\<DLPVersion>\Protect\config]

ImagePreclassifier.OCR_MINIMUM_IMAGE_DIM which is normally set at 200 can be increased to 400 and higher to filter out small files.

Additional Information

You may also be interested in reviewing below articles: 

Article ID: 221599: What are the default image prefilter settings for a detection server

Article ID: 254861 Image Quality and Resolution for OCR results

Article ID: 160504 Detect sensitive data in an image file with DLP