search cancel

Image Quality and Resolution for OCR results

book

Article ID: 254861

calendar_today

Updated On:

Products

Data Loss Prevention Data Loss Prevention Cloud Storage Data Loss Prevention Form Recognition

Issue/Introduction

You are getting inconsistent results for OCR detection.

Environment

Release: 15. +

Cause

Image Quality and Resolution need to meet the minimum.

Resolution

Image Quality and Resolution and Western Language Resolution guidance:
Minimum of 18 pixels vertical for any upper case Latin character, up to a maximum of 8400 pixels for the entire page. The best resolution for B/W images is 300 or 400 dpi. For grayscale or color images the optimal recognition resolution is 150 to 300 dpi.

CJK Language Resolution guidance:
For reliable CJK text detection in an image, the language body text is recommended to be 12 points ("small four" in Chinese size name), scanned at 300 dpi, resulting in characters with around 48 x 48 pixels. The minimum pixel count is about 30 x 30, that is 7.5 points at 300 dpi.

OCR image resolution guidance:
For all OCR languages: any image smaller than 16 pixels or larger than 8400 won't be attempted.

Image Orientation scripts typefaces:
Image orientation is known to work in most situations, however, we are unable to provide a number since there are many factors that influence OCR such as resolution, sharpness, noise in the image, etc.
Text extraction can work with most scripts and typefaces as long as there is no overlap and characters can be individually distinguished.

Number of languages per image:
OCR works on determining the dominant language in the image and does text extraction for that language. The selection of dominant language is based on many factors such as resolution, font size, sharpness and noise.

Image transformations:
As long as the image is sharp and has acceptable quality and resolution, we will be able to extract text in the dominant language.

Additional Information

You may wanna review also:

What are the default image prefilter settings for a detection server
Detect sensitive data in an image file with DLP