We present a methodology for OCR that exhibits the following propertie
s: script-independent feature extraction, training, and recognition co
mponents; no separate segmentation at the character and word levels; a
nd the training is performed automatically on data that is also not pr
esegmented. The methodology is adapted to OCR from continuous speech r
ecognition, which has developed a mature and successful technology bas
ed on Hidden Markov Models. The script independence of the methodology
is demonstrated using omnifont experiments on the DARPA Arabic OCR Co
rpus and the University of Washington English Document Image Database
I. (C) 1998 Pattern Recognition Society. Published by Elsevier Science
Ltd. All rights reserved.