Surveys of the basic concepts and underlying techniques are presented
in this paper. A basic model for document processing is described. In
this model, document processing can be divided into two phases: docume
nt analysis and document understanding. A document has two structures:
geometric (layout) structure and logical structure. Extraction of the
geometric structure from a document refers to document analysis; mapp
ing the geometric structure into logical structure deals with document
understanding. Both types of document structures and the two areas of
document processing are discussed. Two categories of methods have bee
n used in document analysis, namely, (1) hierarchical methods includin
g top-down and bottom-up approaches, (2) no-hierarchical methods inclu
ding modified fractal signature. Tree transform, formatting knowledge
and description language approaches have been used in document underst
anding. A particular case of form document processing is discussed. Fo
rm description and form registration approaches are presented. A form
processing system is also introduced. Finally, many techniques, such a
s skew detection, Hough transform, Gabor filters, projection, crossing
counts, form definition language, etc, which have been used in these
approaches are discussed. Copyright (C) 1996 Pattern Recognition Socie
ty.