AUTOMATIC DOCUMENT PROCESSING - A SURVEY

Citation
Yy. Tang et al., AUTOMATIC DOCUMENT PROCESSING - A SURVEY, Pattern recognition, 29(12), 1996, pp. 1931-1952
Citations number
105
Categorie Soggetti
Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
Journal title
ISSN journal
00313203
Volume
29
Issue
12
Year of publication
1996
Pages
1931 - 1952
Database
ISI
SICI code
0031-3203(1996)29:12<1931:ADP-AS>2.0.ZU;2-1
Abstract
Surveys of the basic concepts and underlying techniques are presented in this paper. A basic model for document processing is described. In this model, document processing can be divided into two phases: docume nt analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapp ing the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed. Two categories of methods have bee n used in document analysis, namely, (1) hierarchical methods includin g top-down and bottom-up approaches, (2) no-hierarchical methods inclu ding modified fractal signature. Tree transform, formatting knowledge and description language approaches have been used in document underst anding. A particular case of form document processing is discussed. Fo rm description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such a s skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, etc, which have been used in these approaches are discussed. Copyright (C) 1996 Pattern Recognition Socie ty.