MACHINE-PRINTED JAPANESE DOCUMENT RECOGNITION

Citation
Sn. Srihari et al., MACHINE-PRINTED JAPANESE DOCUMENT RECOGNITION, Pattern recognition, 30(8), 1997, pp. 1301-1313
Citations number
28
Categorie Soggetti
Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
Journal title
ISSN journal
00313203
Volume
30
Issue
8
Year of publication
1997
Pages
1301 - 1313
Database
ISI
SICI code
0031-3203(1997)30:8<1301:MJDR>2.0.ZU;2-4
Abstract
Cherry Blossom is a general-purpose Japanese document recognition syst em developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text reg ions into text lines and further into characters, and recognizes chara cter images as characters in JIS code. Two feature sets, the Local Str oke Direction feature and the Gradient, Structural, and Concavity feat ure, are used for character classification. Two classification methods , the nearest neighbor classifier and the minimum error subspace metho d, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image databas e developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted fro m diverse document images. Results of our system on this dataset are a lso presented. (C) 1997 Pattern Recognition Society. Published by Else vier Science Ltd.