Stroke-model-based character extraction from gray-level document images

Citation
Xy. Ye et al., Stroke-model-based character extraction from gray-level document images, IEEE IM PR, 10(8), 2001, pp. 1152-1161
Citations number
39
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON IMAGE PROCESSING
ISSN journal
10577149 → ACNP
Volume
10
Issue
8
Year of publication
2001
Pages
1152 - 1161
Database
ISI
SICI code
1057-7149(200108)10:8<1152:SCEFGD>2.0.ZU;2-H
Abstract
Global gray-level thresholding techniques such as Otsu's method, and local gray-level thresholding techniques such as edge-based segmentation or adapt ive thresholding method are powerful in extracting character objects from s imple or slowly varying backgrounds. However, they are found to be insuffic ient when the backgrounds include sharply varying contours or fonts in diff erent sizes. In this paper, a stroke model is proposed to depict the local features of character objects as double-edges in a predefined size. This mo del enables us to detect thin connected components selectively, while ignor ing relatively large backgrounds that appear complex. Meanwhile, since the stroke width restriction is fully factored in, the proposed technique ran b e used to extract characters in predefined font sizes. To process large vol umes of documents efficiently, a hybrid method is proposed for character ex traction from various backgrounds. Using the measurement of class separabil ity to differentiate images with simple backgrounds from those with complex backgrounds, the hybrid method can process documents with different backgr ounds by applying the appropriate methods. Experiments on extracting handwr itings from check image, as well as machine-printed characters from scene i mages demonstrate the effectiveness of the proposed model.