In this study, the method of implementing the three functions that can offe
r great help for a traditional OCCR (Optical Chinese Character Recognition)
system is proposed: (1) to identify the font used in a document; (2) to de
tect and recognize the most frequently used (MFU) characters; and (3) to di
stinguish between the machine-printed and hand-written characters. Accordin
g to the study investigated by Chang and Chen (Proceedings of the ICCC, 199
4, pp. 310-316), about 20% of Chinese characters in a text document are pre
dominated by the top-40 MFU characters. If those MFU characters in a text d
ocument can be detected before adopting the traditional OCCR method, there
will be great savings in computation time.
The proposed method for character detection consists of the following three
stages: the stage of segmentation, the stage of feature extraction, and th
e stage of classification. In the first stage, based on the concept of proj
ection profile, the method presented by Wang et al. (Pattern Recognition 30
(1997) 1213) is utilized to segment characters individually from the input
text document. In the second stage, three different types of features are
introduced, including the density of black pixels, the projection profile c
ode, and the modified skeleton template. These features are used to check w
hether the segmented character is semi-matched or fully-matched with the MF
U template. Finally, in the last stage, based on the matching result, three
different algorithms for implementing the aforementioned functions are pro
vided. Experimental results are given in this study to demonstrate the prac
ticality and superiority of the proposed method. (C) 2001 Elsevier Science
B.V. All rights reserved.