I have a lot of incident that have an issue on the detection of file.
A lot of image file ( .png, .jpg, ;gif) are identified as Unknown Document Format
On investigation these images generally turn out to be very small images which are often found in email signatures.
These files are too small for the OCR server to perform reasonable OCR on.
All current version of DLP.
- This issue is known and will likely be fixed in a future release. However in the interim it may be possible to avoid the issue by increasing the minmum file size setting found in the ImageRecognition.properties file on the server.
ImagePreclassifier.OCR_MINIMUM_IMAGE_DIM which is normally set at 200 can be increased to 400 and higher in order to filter out small files.