ITA
ENG

Automatic indexing of full texts for the Gruner plus Jahr press database

Authors

Rapke, K

Citation

K. Rapke, Automatic indexing of full texts for the Gruner plus Jahr press database, NFD INF-WIS, 52(5), 2001, pp. 251-262

Citations number

Categorie Soggetti

Library & Information Science

Journal title

NFD INFORMATION-WISSENSCHAFT UND PRAXIS

ISSN journal

14344653 → ACNP

Volume

Issue

Year of publication

2001

Pages

251 - 262

Database

ISI

SICI code

1434-4653(200107/08)52:5<251:AIOFTF>2.0.ZU;2-N

Abstract

Retrieval tests are the most recognized method to justify new information r etrieval methods compared to classic retrieval methods. In the context of t his diploma thesis, two basically different systems for automatic indexing are tested and evaluated, based on the Gruner+Jahr press data base, compari ng natural-language retrieval (NLP) and the boolean retrieval. These two sy stems are on the one hand Autonomy by Autonomy Inc. and on the other hand D ocCat which was adapted to the structure of the Gruner+Jahr press data base by IBM. The former is a probabilistic retrieval system and based on natura l-language retrieval whereas DocCat is based on the boolean retrieval. DocC at is a system with learning algorithms that indexes on the basis of a manu ally annotated training corpus. Methodically this evaluation assumes a real -world enviroment in the context of text documentation of Gruner+Jahr. The tests are evaluated according to both statistical and qualitative significa nce. One result of the tests is that DocCat is deficient in relation to int ellectual information retrieval. It has to be refined to solve these proble ms. The other result is that the tested software of Autonomy does not meet the specific requirements of the Gruner+Jahr text documentation.