Optical Character Recognition (OCR) is the conversion of scanned images of handwritten, typewritten or printed text into searchable, editable documents. OCR software is able to recognize the difference between characters and images, and between characters themselves.
The use of paper has been displaced from some activities. For example, the vast majority of journeys on the London Underground are made using the Oyster card without a paper ticket being issued. We have witnessed talk of a paperless office for more than 40 years. However, the office environment has shown a resistance to remove the mountain of paper generated. Things have changed in the past few years, with a marked shift in the paperless office concept. Paper documents contain a wealth of important management data and information that would be better stored electronically. There is computer software that makes this conversion possible. The benefit of scanning documents is not purely for archival reasons. OCR technology is vital for gaining access to paper-based information, as well as integrating that information in digital workflows.
OCR software is not mainstream so open source alternatives to proprietary heavyweight software (such as OmniPage, ReadIRIS, CVision pdfcompressor) are fairly thin on the ground. Matters are also complicated by the fact that OCR computer software needs very sophisticated algorithms to translate the image of text into accurate actual text. The software also has to cope with images that contain a lot more than text, such as layouts, images, graphics, tables, in single or multi pages.
Here’s our recommended free and open source OCR software for Windows.
Optical Character Recognition | |
Tesseract | Originally developed at Hewlett Packard Laboratories. Tesseract runs from the command line |
OCRopus | Python-based tools for document analysis and OCR |
GOCR | Reads images in many formats and outputs a text file |
Are you interested in learning the art of programming? There are lots of excellent free and open source programming books that teach you how to program in every popular programming language. Read these Free Books. |
Leave a Reply