Natural language analysis for semantic document modeling

Citation
T. Brasethvik et Ja. Gulla, Natural language analysis for semantic document modeling, DATA KN ENG, 38(1), 2001, pp. 45-62
Citations number
52
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
DATA & KNOWLEDGE ENGINEERING
ISSN journal
0169023X → ACNP
Volume
38
Issue
1
Year of publication
2001
Pages
45 - 62
Database
ISI
SICI code
0169-023X(200107)38:1<45:NLAFSD>2.0.ZU;2-S
Abstract
To ease the retrieval of documents published on the Web, the documents shou ld be classified in a way that users find helpful and meaningful. This pape r presents an approach to semantic document classification and retrieval ba sed on natural language analysis and conceptual modeling. Users may define their own conceptual domain model, which is then used in combination with l inguistic tools to define a controlled vocabulary for a document collection . Users may browse this domain model and interactively classify documents b y selecting model fragments that describe the contents of the documents. Na tural language tools are used to analyze the text of the documents and prop ose relevant model fragments in terms of selected domain model concepts and named relations. The proposed fragments are refined by the users and store d as document descriptions in RDF-XML format. For document retrieval, lexic al analysis is used to preprocess search expressions and map these to the d omain model for manual query-refinement. A prototype of the system is descr ibed, and the approach is illustrated with examples from a document collect ion published by the Norwegian Center for Medical Informatics (KITH). (C) 2 001 published by Elsevier Science B.V.