ITA
ENG

Incorporating window-based passage-level evidence in document retrieval

Authors

Xi, WS Xu-Rong, R Khoo, CSG Lim, EP

Citation

Ws. Xi et al., Incorporating window-based passage-level evidence in document retrieval, J INF SCI, 27(2), 2001, pp. 73-80

Citations number

Categorie Soggetti

Library & Information Science

Journal title

JOURNAL OF INFORMATION SCIENCE

ISSN journal

01655515 → ACNP

Volume

Issue

Year of publication

2001

Pages

73 - 80

Database

ISI

SICI code

0165-5515(2001)27:2<73:IWPEID>2.0.ZU;2-1

Abstract

This study investigated whether document retrieval can be improved if docum ents are divided into smaller sub-documents or passages and the retrieval s core for these passages are incorporated in the final retrieval score for t he whole document. The documents were segmented by sliding a window of a ce rtain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passa ges extracted and the highest score obtained by a passage of that size was taken as the document's passage-level score for that window size. A range o f window sizes was tried. The experimental results indicated that using a fixed window size of 50 wor ds gave better results than other window sizes for the TREC-5 and TREC-6 te st collections. This window size yielded a significant retrieval improvemen t of 24% compared to using the whole-document retrieval score (using the tr aditional tf*idf weighting scheme with cosine normalisation). However, comb ining this window score and the whole-document retrieval score did not yiel d a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retri eval improvement of about 5% over using a fixed window size of 50. Differen t window sizes were found to work best for different queries. If the best w indow size to use for each query could be predicted accurately, a maximum r etrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in d ocument retrieval depends on the weighting scheme and type of normalisation used in the retrieval method.