Machine learning for intelligent processing of printed documents

Citation
F. Esposito et al., Machine learning for intelligent processing of printed documents, J INTELL IN, 14(2-3), 2000, pp. 175-198
Citations number
34
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
ISSN journal
09259902 → ACNP
Volume
14
Issue
2-3
Year of publication
2000
Pages
175 - 198
Database
ISI
SICI code
0925-9902(200003)14:2-3<175:MLFIPO>2.0.ZU;2-I
Abstract
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer -revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout an d logical structures of the documents. This article proposes the applicatio n of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated fr om a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classifica tion and understanding of documents. Issues concerning the incremental indu ction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the propose d solutions is empirically evaluated by processing a set of real printed do cuments.