DOCUMENT PROCESSING FOR AUTOMATIC KNOWLEDGE ACQUISITION

Citation
Yy. Tang et al., DOCUMENT PROCESSING FOR AUTOMATIC KNOWLEDGE ACQUISITION, IEEE transactions on knowledge and data engineering, 6(1), 1994, pp. 3-21
Citations number
118
Categorie Soggetti
Information Science & Library Science","Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
ISSN journal
10414347
Volume
6
Issue
1
Year of publication
1994
Pages
3 - 21
Database
ISI
SICI code
1041-4347(1994)6:1<3:DPFAKA>2.0.ZU;2-#
Abstract
The knowledge acquisition bottleneck has become the major impediment t o the development and application of effective information systems. To remove this bottleneck, new document processing techniques must be in troduced to automatically acquire knowledge from various types of docu ments. By presenting a survey on the techniques and problems involved, this paper aims at serving as a catalyst to stimulate research in aut omatic knowledge acquisition through document processing. In this stud y, a document is considered to have two structures: geometric structur e and logical structure. These play a key role in the process of the k nowledge acquisition, which can be viewed as a process of acquiring th e above structures. Extracting the geometric structure from a document refers to document analysis; mapping the geometric structure into log ical structure is regarded as document understanding. Both areas will be described in this paper, and the basic concept of document structur e and its measurement based on entropy analysis will be introduced. Lo gical structure and geometric models are proposed. Both top-down and b ottom-up approaches and their entropy analyses will be presented. Diff erent techniques will be discussed with practical examples. Mapping me thods, such as tree transformation, document formatting knowledge and document format description language, will also be described.