background
text_tech
AAC - Basics
Infrastructure
Scanning
OCR
Image Processing
XML Markup
Text Retrieval
Databases
Corpus Tools
Web Design
AAC-Container
Applications
Lab
Institution
OCR
From Image to Text
Print
The art of converting pixels of images into machine-editable text looks already back on a meanwhile longstanding tradition of successful developments. Programs available nowadays produce often quite respectable and sometimes even astonishing results. However, working with historical sources is quite a different story. Trying to do the job in this field not only requires adequate hardware but also a considerable amount of expertise and experience.

The historical period the AAC has been working on in recent years poses a number of quite particular challenges such as poor paper quality, small letter sizes and black-letter typefaces. The varying combinatorial possibilities of these features bring about a number of situations each demanding particular procedures to tackle the task.

In doing this, we have been trying to tap the full potential of existing solutions. In many cases training of the software has been helpful, very often more time consuming interactive approaches had to be chosen. In a number of cases the application of OCR standard procedures turned out to be unjustifiably inefficient or yielding little usable output. In such circumstances OCR was replaced double-keying, a technique which has been applied in particular to newspaper sources cooperating with a Beijing based company.



Top: Pro Lector - OCR

Bottom: Fine Reader - OCR