CLASSIFICATION OF DOCUMENTS BY FORM AND CONTENT

Citation
G. Maderlechner et al., CLASSIFICATION OF DOCUMENTS BY FORM AND CONTENT, Pattern recognition letters, 18(11-13), 1997, pp. 1225-1231
Citations number
8
Journal title
ISSN journal
01678655
Volume
18
Issue
11-13
Year of publication
1997
Pages
1225 - 1231
Database
ISI
SICI code
0167-8655(1997)18:11-13<1225:CODBFA>2.0.ZU;2-E
Abstract
This paper presents a modular software system, which classifies a larg e variety of office documents according to layout form and textual con tent. It consists of the following components: layout analysis, pre-cl assification, OCR interface, fuzzy string matching, text categorizatio n, lexical, syntactical and semantic analysis. The system has been app lied to the following tasks: presorting of forms, reports and letters, index extraction for archiving and retrieval, page type classificatio n and text column analysis of real estate register documents, in-house mail sorting and electronic distribution to departments. The architec ture, modules, and practical results are described. (C) 1997 Elsevier Science B.V.