ITA
ENG

KEYWORD SPOTTING IN POORLY PRINTED DOCUMENTS USING PSEUDO-2D HIDDEN MARKOV-MODELS

Authors

KUO SS AGAZZI OE

Citation

Ss. Kuo et Oe. Agazzi, KEYWORD SPOTTING IN POORLY PRINTED DOCUMENTS USING PSEUDO-2D HIDDEN MARKOV-MODELS, IEEE transactions on pattern analysis and machine intelligence, 16(8), 1994, pp. 842-848

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Artificial Intelligence","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on pattern analysis and machine intelligence → ACNP

ISSN journal

01628828

Volume

Issue

Year of publication

1994

Pages

842 - 848

Database

ISI

SICI code

0162-8828(1994)16:8<842:KSIPPD>2.0.ZU;2-P

Abstract

An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented. For each keyword, two statistica l models, named pseudo 2-D Hidden Markov Models, are created for repre senting the actual keyword and all the other extraneous words, respect ively. Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision. Although the models are pseudo 2-D in the sense that they are not ful ly connected 2-D networks, they are shown to be general enough in char acterizing printed words efficiently. These models facilitate a nice ' 'elastic matching'' property in both horizontal and vertical direction s, which makes the recognizer not only independent of size and slant b ut also tolerant of highly deformed and noisy words. The system is eva luated on a synthetically created database that contains about 26000 w ords. Currently, we achieve the recognition accuracy of 99% when words in testing and training sets are or the same font size, and 96% when they are in different sizes. In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate.