Multithresholding of mixed-type documents

Citation
C. Strouthopoulos et N. Papamarkos, Multithresholding of mixed-type documents, ENG APP ART, 13(3), 2000, pp. 323-343
Citations number
28
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
ISSN journal
09521976 → ACNP
Volume
13
Issue
3
Year of publication
2000
Pages
323 - 343
Database
ISI
SICI code
0952-1976(200006)13:3<323:MOMD>2.0.ZU;2-E
Abstract
Mixed-type documents include text, drawings and graphics regions. It is obv ious that a technique that can reduce the number of the gray-levels in acco rdance with the type of each document region could be important for many do cument applications, such as storage, transmission and recognition. To solv e this problem, this paper proposes a new method, called the document multi thresholding technique. The method is based on a page layout analysis (PLA) technique and on a neural-network multilevel threshold-selection approach. The proposed technique is applicable to any mixed-type document and achiev es document multithresholding by taking advantage of the types of the docum ent blocks. Thus, in the final document different block types are stored wi th the appropriate and limited numbers of pray-level values. The proposed m ethod includes two main steps. First, a PLA technique is applied, which cla ssifies the document blocks into text, line-drawing and graphics regions. I n the second stage, a new neural-network multithresholding technique is app lied to each of the document blocks. In text and line-drawing blocks, only one threshold is determined, whereas in the graphics blocks the optimal num ber of thresholds is first determined. The performance of the method has be en extensively tested on a variety of documents. Several examples illustrat e the strength and the effectiveness of the proposed methodology. (C) 2000 Elsevier Science Ltd. All rights reserved.