DOCUMENT-RETRIEVAL TOLERATING CHARACTER-RECOGNITION ERRORS - EVALUATION AND APPLICATION

Citation
K. Marukawa et al., DOCUMENT-RETRIEVAL TOLERATING CHARACTER-RECOGNITION ERRORS - EVALUATION AND APPLICATION, Pattern recognition, 30(8), 1997, pp. 1361-1371
Citations number
17
Categorie Soggetti
Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
Journal title
ISSN journal
00313203
Volume
30
Issue
8
Year of publication
1997
Pages
1361 - 1371
Database
ISI
SICI code
0031-3203(1997)30:8<1361:DTCE-E>2.0.ZU;2-1
Abstract
This paper presents two methods of combining character recognition wit h techniques for retrieving Japanese documents and also shows how thes e methods can be applied to textual image retrieval. Both retrieval me thods are tolerant of errors that occur during the character recogniti on process. The basic idea is to utilize the characteristics of recogn ition errors. One uses a confusion matrix to generate ''equivalent'' q uery strings that should match erroneously recognized text. The other one searches a ''non-deterministic text'' that contains multiple candi dates for ambiguous recognition results. Simulation experiments have s hown that both methods can effectively combine character recognition w ith retrieval techniques. (C) 1997 Pattern Recognition Society. Publis hed by Elsevier Science Ltd.