STRUCTURE RECOGNITION OF VARIOUS KINDS OF TABLE-FORM DOCUMENTS

Citation
Q. Luo et al., STRUCTURE RECOGNITION OF VARIOUS KINDS OF TABLE-FORM DOCUMENTS, Systems and computers in Japan, 25(10), 1994, pp. 82-97
Citations number
15
Categorie Soggetti
Computer Science Hardware & Architecture","Computer Science Information Systems","Computer Science Theory & Methods
ISSN journal
08821666
Volume
25
Issue
10
Year of publication
1994
Pages
82 - 97
Database
ISI
SICI code
0882-1666(1994)25:10<82:SROVKO>2.0.ZU;2-V
Abstract
The recognition of the structure of a document is to discriminate the layout structure, i.e., the two-dimensional configuration and format, of the document, and to identify the individual item data. Most of the studies of this kind so far, however, are based on the paradigm for t he document structure discrimination, where the information concerning the document structure is defined beforehand for a particular type of document and is utilized as the knowledge-base. Such a paradigm is su ccessful in recognizing the same document structure or document struct ure of the same kind, but is not applicable to the case where various kinds of document structures are mixed. This paper addresses table-for m documents as the objects of processing, and reports on a method whic h can recognize the document structures for various kinds of table-for m documents. Various classes of table-form documents with various conf igurations and contents are available according to its use and adjacen t relationship between item fields. To recognize exactly the document structure for various kinds of table-form documents, it is essential t o develop the processing method based on the information for each clas s of table-form documents. For this purpose, the classification tree i s used, which hierarchically manages the information for each case of table-form documents. A structure recognition system for multiple kind s of table-form documents, is realized with this framework, including the recognition of table-form document class, the automatic acquisitio n of layout structure information and the recognition of document stru cture.