I’ve recently being reviewing some statistics taken from our detection servers to help with the sizing and testing of OCR and was wondering what each metric relates to?
Example Data
INFO: May 07, 2020 10:53:11 AM: Message received with candidate OCR images. [Number of images in last 24 hours: 1225. Percentage of Messages containing images that need OCR: 0.7806%. Average number of OCR images per message: 1.0]
I understand that the ‘Percentage of Messages containing images that need OCR (0.7806% - in this example)’ is the percentage of all messages scanned that include OCR – however, does the ‘Number of Images in last 24 hours: (in this case 1225)’ mean that 0.7806% of this total will be OCR compatible or is it that the 1225 is in fact the total number of images that would be OCR scanned?
In the example, 1225 represents the number of images that are deemed as OCR candidates by the image pre-classifier module.
So had I enabled OCR feature in that 24 hour window, the OCR servers would have seen 1225 requests, not 0.78% of 1225.
This metric gives us a sense of traffic of OCR requests that this detection server would generate.
2023/05/31 - No update needed.