The knowledge acquisition bottleneck has become the major impediment t
o the development and application of effective information systems. To
remove this bottleneck, new document processing techniques must be in
troduced to automatically acquire knowledge from various types of docu
ments. By presenting a survey on the techniques and problems involved,
this paper aims at serving as a catalyst to stimulate research in aut
omatic knowledge acquisition through document processing. In this stud
y, a document is considered to have two structures: geometric structur
e and logical structure. These play a key role in the process of the k
nowledge acquisition, which can be viewed as a process of acquiring th
e above structures. Extracting the geometric structure from a document
refers to document analysis; mapping the geometric structure into log
ical structure is regarded as document understanding. Both areas will
be described in this paper, and the basic concept of document structur
e and its measurement based on entropy analysis will be introduced. Lo
gical structure and geometric models are proposed. Both top-down and b
ottom-up approaches and their entropy analyses will be presented. Diff
erent techniques will be discussed with practical examples. Mapping me
thods, such as tree transformation, document formatting knowledge and
document format description language, will also be described.