ITA
ENG

Predicting the effectiveness of naive data fusion on the basis of system characteristics

Authors

Ng, KB Kantor, PB

Citation

Kb. Ng et Pb. Kantor, Predicting the effectiveness of naive data fusion on the basis of system characteristics, J AM S INFO, 51(13), 2000, pp. 1177-1189

Citations number

Categorie Soggetti

Library & Information Science

Journal title

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE

ISSN journal

00028231 → ACNP

Volume

Issue

Year of publication

2000

Pages

1177 - 1189

Database

ISI

SICI code

0002-8231(200011)51:13<1177:PTEOND>2.0.ZU;2-U

Abstract

Effective automation of the information retrieval task has long been an act ive area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in t he process called "data fusion." There are many successful data fusion expe riments reported in in literature, but there are also cases in which it did not work well. Thus, if would be quite valuable to have a theory that can predict, in advance, whether fusion of two or more retrieval schemes will b e worth doing. In previous study (Ng & Kantor, 1998), we identified two pre dictive variables for the effectiveness of fusion: (a) a list-based measure of output dissimilarity, and (b) a pair-wise measure of the similarity of performance of the two schemes. In this article we investigate the predicti ve power of these two variables in simple symmetrical data fusion. We use t he in systems participating in the TREC 4 routing task to train a model tha t predicts the effectiveness of data fusion, and use the in systems partici pating in the TREC 5 routing task to test that model. The model asks, "when will fusion perform better than an oracle who uses the best scheme from ea ch pair?" We explore statistical techniques for fitting the model to the tr aining data and use the receiver operating characteristic curve of signal d etection theory to represent the power of the resulting models. The trained prediction methods predict whether fusion will beat an oracle, at levels m uch higher than could be achieved by chance.