Automatic indexing of full texts for the Gruner plus Jahr press database

Authors
Citation
K. Rapke, Automatic indexing of full texts for the Gruner plus Jahr press database, NFD INF-WIS, 52(5), 2001, pp. 251-262
Citations number
15
Categorie Soggetti
Library & Information Science
Journal title
NFD INFORMATION-WISSENSCHAFT UND PRAXIS
ISSN journal
14344653 → ACNP
Volume
52
Issue
5
Year of publication
2001
Pages
251 - 262
Database
ISI
SICI code
1434-4653(200107/08)52:5<251:AIOFTF>2.0.ZU;2-N
Abstract
Retrieval tests are the most recognized method to justify new information r etrieval methods compared to classic retrieval methods. In the context of t his diploma thesis, two basically different systems for automatic indexing are tested and evaluated, based on the Gruner+Jahr press data base, compari ng natural-language retrieval (NLP) and the boolean retrieval. These two sy stems are on the one hand Autonomy by Autonomy Inc. and on the other hand D ocCat which was adapted to the structure of the Gruner+Jahr press data base by IBM. The former is a probabilistic retrieval system and based on natura l-language retrieval whereas DocCat is based on the boolean retrieval. DocC at is a system with learning algorithms that indexes on the basis of a manu ally annotated training corpus. Methodically this evaluation assumes a real -world enviroment in the context of text documentation of Gruner+Jahr. The tests are evaluated according to both statistical and qualitative significa nce. One result of the tests is that DocCat is deficient in relation to int ellectual information retrieval. It has to be refined to solve these proble ms. The other result is that the tested software of Autonomy does not meet the specific requirements of the Gruner+Jahr text documentation.