To ease the retrieval of documents published on the Web, the documents shou
ld be classified in a way that users find helpful and meaningful. This pape
r presents an approach to semantic document classification and retrieval ba
sed on natural language analysis and conceptual modeling. Users may define
their own conceptual domain model, which is then used in combination with l
inguistic tools to define a controlled vocabulary for a document collection
. Users may browse this domain model and interactively classify documents b
y selecting model fragments that describe the contents of the documents. Na
tural language tools are used to analyze the text of the documents and prop
ose relevant model fragments in terms of selected domain model concepts and
named relations. The proposed fragments are refined by the users and store
d as document descriptions in RDF-XML format. For document retrieval, lexic
al analysis is used to preprocess search expressions and map these to the d
omain model for manual query-refinement. A prototype of the system is descr
ibed, and the approach is illustrated with examples from a document collect
ion published by the Norwegian Center for Medical Informatics (KITH). (C) 2
001 published by Elsevier Science B.V.