DOCUMENT LENGTH NORMALIZATION

Citation
A. Singhal et al., DOCUMENT LENGTH NORMALIZATION, Information processing & management, 32(5), 1996, pp. 619-633
Citations number
20
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
ISSN journal
03064573
Volume
32
Issue
5
Year of publication
1996
Pages
619 - 633
Database
ISI
SICI code
0306-4573(1996)32:5<619:DLN>2.0.ZU;2-W
Abstract
In the TREC collection-a large full-text experimental text collection with widely varying document lengths-we observe that the likelihood of a document being judged relevant by a user increases with the documen t length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with rough ly equal chances, will not optimally retrieve useful documents from su ch a collection. We present a modified technique-pivoted cosine normal ization-that attempts to match the likelihood of retrieving documents of all lengths to the likelihood of their relevance, and show that thi s technique yields significant improvements in retrieval effectiveness . Copyright (C) 1996 Elsevier Science Ltd