ITA
ENG

Variations in relevance judgments and the measurement of retrieval effectiveness

Authors

Voorhees, EM

Citation

Em. Voorhees, Variations in relevance judgments and the measurement of retrieval effectiveness, INF PR MAN, 36(5), 2000, pp. 697-716

Citations number

Categorie Soggetti

Library & Information Science","Information Tecnology & Communication Systems

Journal title

INFORMATION PROCESSING & MANAGEMENT

ISSN journal

03064573 → ACNP

Volume

Issue

Year of publication

2000

Pages

697 - 716

Database

ISI

SICI code

0306-4573(200009)36:5<697:VIRJAT>2.0.ZU;2-C

Abstract

Test collections have traditionally been used by information retrieval rese archers to improve their retrieval strategies. To be viable as a laboratory tool, a collection must reliably rank different retrieval variants accordi ng to their true effectiveness. In particular, the relative effectiveness o f two retrieval strategies should be insensitive to modest changes in the r elevant document set since individual relevance assessments are known to va ry widely. The test collections developed in the TREC workshops have become the collec tions of choice in the retrieval research community. To verify their reliab ility, NIST investigated the effect changes in the relevance assessments ha ve on the evaluation of retrieval results. Very high correlations were foun d among the rankings of systems Produced using different relevance judgment sets. The high correlations indicate that the comparative evaluation of re trieval performance is stable despite substantial differences in relevance judgments, and thus reaffirm the use of the TREC collections as laboratory tools. Published by Elsevier Science Ltd.