Variations in relevance judgments and the measurement of retrieval effectiveness

Authors
Citation
Em. Voorhees, Variations in relevance judgments and the measurement of retrieval effectiveness, INF PR MAN, 36(5), 2000, pp. 697-716
Citations number
17
Categorie Soggetti
Library & Information Science","Information Tecnology & Communication Systems
Journal title
INFORMATION PROCESSING & MANAGEMENT
ISSN journal
03064573 → ACNP
Volume
36
Issue
5
Year of publication
2000
Pages
697 - 716
Database
ISI
SICI code
0306-4573(200009)36:5<697:VIRJAT>2.0.ZU;2-C
Abstract
Test collections have traditionally been used by information retrieval rese archers to improve their retrieval strategies. To be viable as a laboratory tool, a collection must reliably rank different retrieval variants accordi ng to their true effectiveness. In particular, the relative effectiveness o f two retrieval strategies should be insensitive to modest changes in the r elevant document set since individual relevance assessments are known to va ry widely. The test collections developed in the TREC workshops have become the collec tions of choice in the retrieval research community. To verify their reliab ility, NIST investigated the effect changes in the relevance assessments ha ve on the evaluation of retrieval results. Very high correlations were foun d among the rankings of systems Produced using different relevance judgment sets. The high correlations indicate that the comparative evaluation of re trieval performance is stable despite substantial differences in relevance judgments, and thus reaffirm the use of the TREC collections as laboratory tools. Published by Elsevier Science Ltd.