A DIFFERENTIAL-PROCESSING EXTRACTION APPROACH TO TEXT AND IMAGE SEGMENTATION

Citation
Gw. Leng et al., A DIFFERENTIAL-PROCESSING EXTRACTION APPROACH TO TEXT AND IMAGE SEGMENTATION, Engineering applications of artificial intelligence, 7(6), 1994, pp. 639-651
Citations number
9
Categorie Soggetti
Computer Application, Chemistry & Engineering","Computer Science Artificial Intelligence",Engineering
ISSN journal
09521976
Volume
7
Issue
6
Year of publication
1994
Pages
639 - 651
Database
ISI
SICI code
0952-1976(1994)7:6<639:ADEATT>2.0.ZU;2-W
Abstract
To efficiently store the information found in paper documents, text an d non-text regions need to be separated. Non-text regions include half -tone photographs and line diagrams. The text regions can be converted (via an optical character reader) to a computer-searchable form, and the non-text regions can be extracted and preserved in compressed form using image-compression algorithms. In this paper, an effective syste m for automatically segmenting a document image into regions of text a nd non-text is proposed. The system first performs an adaptive thresho lding to obtain a binarized image. Subsequently the binarized image is smeared using a run-length differential algorithm. The smeared image is then subjected to a text characteristic filter to remove error smea ring of non-text regions. Next, baseline cumulative blocking is used t o rectangularize the smeared region. Finally, a text block growing alg orithm is used to block out a text sentence. The recognition of text i s carried out on a text sentence basis.