Kb. Ng et Pb. Kantor, An investigation of the conditions for effective data fusion in information retrieval: A pilot study, P ASIS ANNU, 35, 1998, pp. 166-178
Effective automation of the information retrieval task has long been an act
ive area of research, leading to sophisticated retrieval models. With many
IR schemes available, researchers have begun to investigate the benefits of
combining the results of different IR schemes to improve performance. Ther
e are many successful data fusion experiments reported in IR literature, bu
t there are also experiments in which data fusion did not work while using
the same fusion rules. What is needed is a theory to tell a priori when one
should use data fusion methods. We categorize different theoretical justif
ications of data fusion into two approaches, examine their implications, an
alyze some of the unsuccessful data fusion experiments, and propose two con
ditions for effective data fusion: (1) The condition of efficacy and (2) Th
e condition of dissimilarity. We have developed a mathematical measure (Pai
r-out-of-order) to measure inter-scheme dissimilarity, and have developed a
lgorithms and computer programs to implement our ideas. We report on a pilo
t test using the output lists of all IR schemes which participated in the R
outing task of TREC 4. Our result indicates that the efficacy and inter-sch
eme dissimilarity are good predictors for effectiveness of data fusion. In
addition, we find that model using the ratio of efficacies of two schemes c
an improve our ability to predict fusion effectiveness.