Cherry Blossom is a general-purpose Japanese document recognition syst
em developed at CEDAR. The input to the system can be facsimile pages
or images scanned at low resolution. Given a Japanese document image,
the system deskews the image, extracts text regions, segments text reg
ions into text lines and further into characters, and recognizes chara
cter images as characters in JIS code. Two feature sets, the Local Str
oke Direction feature and the Gradient, Structural, and Concavity feat
ure, are used for character classification. Two classification methods
, the nearest neighbor classifier and the minimum error subspace metho
d, have been designed and they have been integrated to achieve better
performance. We also describe the new Japanese character image databas
e developed at CEDAR. This database consists of approximately 180,000
labeled character images from more than 3300 categories, extracted fro
m diverse document images. Results of our system on this dataset are a
lso presented. (C) 1997 Pattern Recognition Society. Published by Else
vier Science Ltd.