In the TREC collection-a large full-text experimental text collection
with widely varying document lengths-we observe that the likelihood of
a document being judged relevant by a user increases with the documen
t length. We show that a retrieval strategy, such as the vector-space
cosine match, that retrieves documents of different lengths with rough
ly equal chances, will not optimally retrieve useful documents from su
ch a collection. We present a modified technique-pivoted cosine normal
ization-that attempts to match the likelihood of retrieving documents
of all lengths to the likelihood of their relevance, and show that thi
s technique yields significant improvements in retrieval effectiveness
. Copyright (C) 1996 Elsevier Science Ltd