Empirical work shows significant benefits from using relevance feedback dat
a to improve information retrieval (IR) performance. Still, one fundamental
difficulty has limited the ability to fully exploit this valuable data. Th
e problem is that it is not clear whether the relevance feedback data shoul
d be used to train the system about what the users really mean, or about wh
at the documents really mean. In this paper, we resolve the question using
a maximum likelihood framework. We show how all the available data can be u
sed to simultaneously estimate both documents and queries in proportions th
at are optimal in a maximum likelihood sense. The resulting algorithm is di
rectly applicable to many approaches to IR, and the unified framework can h
elp explain previously reported results as well as guide the search for new
methods that utilize feedback data in IR.