ITA
ENG

DOCUMENT LENGTH NORMALIZATION

Authors

SINGHAL A SALTON G MITRA M BUCKLEY C

Citation

A. Singhal et al., DOCUMENT LENGTH NORMALIZATION, Information processing & management, 32(5), 1996, pp. 619-633

Citations number

Categorie Soggetti

Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems

Journal title

Information processing & management → ACNP

ISSN journal

03064573

Volume

Issue

Year of publication

1996

Pages

619 - 633

Database

ISI

SICI code

0306-4573(1996)32:5<619:DLN>2.0.ZU;2-W

Abstract

In the TREC collection-a large full-text experimental text collection with widely varying document lengths-we observe that the likelihood of a document being judged relevant by a user increases with the documen t length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with rough ly equal chances, will not optimally retrieve useful documents from su ch a collection. We present a modified technique-pivoted cosine normal ization-that attempts to match the likelihood of retrieving documents of all lengths to the likelihood of their relevance, and show that thi s technique yields significant improvements in retrieval effectiveness . Copyright (C) 1996 Elsevier Science Ltd