Ss. Kuo et Oe. Agazzi, KEYWORD SPOTTING IN POORLY PRINTED DOCUMENTS USING PSEUDO-2D HIDDEN MARKOV-MODELS, IEEE transactions on pattern analysis and machine intelligence, 16(8), 1994, pp. 842-848
An algorithm for robust machine recognition of keywords embedded in a
poorly printed document is presented. For each keyword, two statistica
l models, named pseudo 2-D Hidden Markov Models, are created for repre
senting the actual keyword and all the other extraneous words, respect
ively. Dynamic programming is then used for matching an unknown input
word with the two models and for making a maximum likelihood decision.
Although the models are pseudo 2-D in the sense that they are not ful
ly connected 2-D networks, they are shown to be general enough in char
acterizing printed words efficiently. These models facilitate a nice '
'elastic matching'' property in both horizontal and vertical direction
s, which makes the recognizer not only independent of size and slant b
ut also tolerant of highly deformed and noisy words. The system is eva
luated on a synthetically created database that contains about 26000 w
ords. Currently, we achieve the recognition accuracy of 99% when words
in testing and training sets are or the same font size, and 96% when
they are in different sizes. In the latter case, the conventional 1-D
HMM achieves only a 70% accuracy rate.