ITA
ENG

An investigation of the conditions for effective data fusion in information retrieval: A pilot study

Authors

Ng, KB Kantor, PB

Citation

Kb. Ng et Pb. Kantor, An investigation of the conditions for effective data fusion in information retrieval: A pilot study, P ASIS ANNU, 35, 1998, pp. 166-178

Citations number

Categorie Soggetti

Library & Information Science

Journal title

PROCEEDINGS OF THE ASIS ANNUAL MEETING

ISSN journal

00447870 → ACNP

Volume

Year of publication

1998

Pages

166 - 178

Database

ISI

SICI code

0044-7870(1998)35:<166:AIOTCF>2.0.ZU;2-M

Abstract

Effective automation of the information retrieval task has long been an act ive area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. Ther e are many successful data fusion experiments reported in IR literature, bu t there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justif ications of data fusion into two approaches, examine their implications, an alyze some of the unsuccessful data fusion experiments, and propose two con ditions for effective data fusion: (1) The condition of efficacy and (2) Th e condition of dissimilarity. We have developed a mathematical measure (Pai r-out-of-order) to measure inter-scheme dissimilarity, and have developed a lgorithms and computer programs to implement our ideas. We report on a pilo t test using the output lists of all IR schemes which participated in the R outing task of TREC 4. Our result indicates that the efficacy and inter-sch eme dissimilarity are good predictors for effectiveness of data fusion. In addition, we find that model using the ratio of efficacies of two schemes c an improve our ability to predict fusion effectiveness.