ITA
ENG

Natural language analysis for semantic document modeling

Authors

Brasethvik, T Gulla, JA

Citation

T. Brasethvik et Ja. Gulla, Natural language analysis for semantic document modeling, DATA KN ENG, 38(1), 2001, pp. 45-62

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

DATA & KNOWLEDGE ENGINEERING

ISSN journal

0169023X → ACNP

Volume

Issue

Year of publication

2001

Pages

45 - 62

Database

ISI

SICI code

0169-023X(200107)38:1<45:NLAFSD>2.0.ZU;2-S

Abstract

To ease the retrieval of documents published on the Web, the documents shou ld be classified in a way that users find helpful and meaningful. This pape r presents an approach to semantic document classification and retrieval ba sed on natural language analysis and conceptual modeling. Users may define their own conceptual domain model, which is then used in combination with l inguistic tools to define a controlled vocabulary for a document collection . Users may browse this domain model and interactively classify documents b y selecting model fragments that describe the contents of the documents. Na tural language tools are used to analyze the text of the documents and prop ose relevant model fragments in terms of selected domain model concepts and named relations. The proposed fragments are refined by the users and store d as document descriptions in RDF-XML format. For document retrieval, lexic al analysis is used to preprocess search expressions and map these to the d omain model for manual query-refinement. A prototype of the system is descr ibed, and the approach is illustrated with examples from a document collect ion published by the Norwegian Center for Medical Informatics (KITH). (C) 2 001 published by Elsevier Science B.V.