ITA
ENG

Machine learning for intelligent processing of printed documents

Authors

Esposito, F Malerba, D Lisi, FA

Citation

F. Esposito et al., Machine learning for intelligent processing of printed documents, J INTELL IN, 14(2-3), 2000, pp. 175-198

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

ISSN journal

09259902 → ACNP

Volume

Issue

2-3

Year of publication

2000

Pages

175 - 198

Database

ISI

SICI code

0925-9902(200003)14:2-3<175:MLFIPO>2.0.ZU;2-I

Abstract

A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer -revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout an d logical structures of the documents. This article proposes the applicatio n of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated fr om a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classifica tion and understanding of documents. Issues concerning the incremental indu ction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the propose d solutions is empirically evaluated by processing a set of real printed do cuments.