search cancel

OCR Unknown Document Format

book

Article ID: 242863

calendar_today

Updated On:

Products

Data Loss Prevention Network Monitor and Prevent for Email and Web

Issue/Introduction

 

I have a lot of incident that have an issue on the detection of file.

A lot of image file ( .png, .jpg, ;gif) are identified as Unknown Document Format

 

On investigation these images generally turn out to be very small images which are often found in email signatures.

Cause

These files are too small for the OCR server to perform reasonable OCR on.

Environment

All current version of DLP.

Resolution

- This issue is known and will likely be fixed in a future release. However in the interim it may be possible to avoid the issue by increasing the minmum file size setting found in the ImageRecognition.properties file on the server.

ImagePreclassifier.OCR_MINIMUM_IMAGE_DIM which is normally set at 200 can be increased to 400 and higher in order to filter out small files.