ITA
ENG

MACHINE-PRINTED JAPANESE DOCUMENT RECOGNITION

Authors

SRIHARI SN HONG T SRIKANTAN G

Citation

Sn. Srihari et al., MACHINE-PRINTED JAPANESE DOCUMENT RECOGNITION, Pattern recognition, 30(8), 1997, pp. 1301-1313

Citations number

Categorie Soggetti

Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence

Journal title

Pattern recognition → ACNP

ISSN journal

00313203

Volume

Issue

Year of publication

1997

Pages

1301 - 1313

Database

ISI

SICI code

0031-3203(1997)30:8<1301:MJDR>2.0.ZU;2-4

Abstract

Cherry Blossom is a general-purpose Japanese document recognition syst em developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text reg ions into text lines and further into characters, and recognizes chara cter images as characters in JIS code. Two feature sets, the Local Str oke Direction feature and the Gradient, Structural, and Concavity feat ure, are used for character classification. Two classification methods , the nearest neighbor classifier and the minimum error subspace metho d, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image databas e developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted fro m diverse document images. Results of our system on this dataset are a lso presented. (C) 1997 Pattern Recognition Society. Published by Else vier Science Ltd.