A paper document processing system is an information system component which
transforms information on printed or handwritten documents into a computer
-revisable form. In intelligent systems for paper document processing this
information capture process is based on knowledge of the specific layout an
d logical structures of the documents. This article proposes the applicatio
n of machine learning techniques to acquire the specific knowledge required
by an intelligent document processing system, named WISDOM++, that manages
printed documents, such as letters and journals. Knowledge is represented
by means of decision trees and first-order rules automatically generated fr
om a set of training documents. In particular, an incremental decision tree
learning system is applied for the acquisition of decision trees used for
the classification of segmented blocks, while a first-order learning system
is applied for the induction of rules used for the layout-based classifica
tion and understanding of documents. Issues concerning the incremental indu
ction of decision trees and the handling of both numeric and symbolic data
in first-order rule learning are discussed, and the validity of the propose
d solutions is empirically evaluated by processing a set of real printed do
cuments.