OCR...The Devil Is In The Details

OCR (optical character recognition) is a complex and inexact process. I covered some of this in a previous post. The video below is a deeper dive into the processing and math (just a touch) behind how OCR processing happens.

Because of the complexity of this process it is important to begin with the best quality original scan possible. Some scan settings to keep in mind:
  1. Color - This should be set to Grayscale. The grayscale setting helps to gather as much detail as possible from the original without making the file size too large.
  2. Resolution (DPI) - This should be set to 300 dpi. This is the minimum resolution recommended for OCR.
  3. Format - The file format should be PDF.






Comments

Popular posts from this blog

Moodle: Read-Only Forums