An investigation of the conditions for effective data fusion in information retrieval: A pilot study

Authors
Citation
Kb. Ng et Pb. Kantor, An investigation of the conditions for effective data fusion in information retrieval: A pilot study, P ASIS ANNU, 35, 1998, pp. 166-178
Citations number
27
Categorie Soggetti
Library & Information Science
Journal title
PROCEEDINGS OF THE ASIS ANNUAL MEETING
ISSN journal
00447870 → ACNP
Volume
35
Year of publication
1998
Pages
166 - 178
Database
ISI
SICI code
0044-7870(1998)35:<166:AIOTCF>2.0.ZU;2-M
Abstract
Effective automation of the information retrieval task has long been an act ive area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. Ther e are many successful data fusion experiments reported in IR literature, bu t there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justif ications of data fusion into two approaches, examine their implications, an alyze some of the unsuccessful data fusion experiments, and propose two con ditions for effective data fusion: (1) The condition of efficacy and (2) Th e condition of dissimilarity. We have developed a mathematical measure (Pai r-out-of-order) to measure inter-scheme dissimilarity, and have developed a lgorithms and computer programs to implement our ideas. We report on a pilo t test using the output lists of all IR schemes which participated in the R outing task of TREC 4. Our result indicates that the efficacy and inter-sch eme dissimilarity are good predictors for effectiveness of data fusion. In addition, we find that model using the ratio of efficacies of two schemes c an improve our ability to predict fusion effectiveness.