KEYWORD SPOTTING IN POORLY PRINTED DOCUMENTS USING PSEUDO-2D HIDDEN MARKOV-MODELS

Authors
Citation
Ss. Kuo et Oe. Agazzi, KEYWORD SPOTTING IN POORLY PRINTED DOCUMENTS USING PSEUDO-2D HIDDEN MARKOV-MODELS, IEEE transactions on pattern analysis and machine intelligence, 16(8), 1994, pp. 842-848
Citations number
11
Categorie Soggetti
Computer Sciences","Computer Science Artificial Intelligence","Engineering, Eletrical & Electronic
ISSN journal
01628828
Volume
16
Issue
8
Year of publication
1994
Pages
842 - 848
Database
ISI
SICI code
0162-8828(1994)16:8<842:KSIPPD>2.0.ZU;2-P
Abstract
An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistica l models, named pseudo 2-D Hidden Markov Models, are created for repre senting the actual keyword and all the other extraneous words, respect ively. Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not ful ly connected 2-D networks, they are shown to be general enough in char acterizing printed words efficiently. These models facilitate a nice ' 'elastic matching'' property in both horizontal and vertical direction s, which makes the recognizer not only independent of size and slant b ut also tolerant of highly deformed and noisy words. The system is eva luated on a synthetically created database that contains about 26000 w ords. Currently, we achieve the recognition accuracy of 99% when words in testing and training sets are or the same font size, and 96% when they are in different sizes. In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate.